The Linux Kernel Mailing List

The Linux Kernel Mailing List
 help / color / mirror / Atom feed

* Re: [PATCH v2] tracing/x86: Update syscall trace events to handle new x86 syscall func names
From: Steven Rostedt @ 2018-04-18 15:30 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Dominik Brodowski, LKML, Ingo Molnar, Thomas Gleixner,
	H. Peter Anvin, x86, Wang Nan, Alexei Starovoitov,
	Daniel Borkmann
In-Reply-To: <20180418152536.GD10084@kernel.org>

On Wed, 18 Apr 2018 12:25:36 -0300
Arnaldo Carvalho de Melo <acme@kernel.org> wrote:

> Em Wed, Apr 18, 2018 at 11:20:33AM -0400, Steven Rostedt escreveu:
> > On Wed, 18 Apr 2018 12:17:16 -0300
> > Arnaldo Carvalho de Melo <acme@kernel.org> wrote:
> >   
> > > This does the trick, by not using the main syscall routine, but one
> > > called from it and not renamed, should work with older kernels.
> > > 
> > > This test should be improved to look if the desired routine is in place,
> > > if not just skip the test and tell about the unavailability of the
> > > wanted function, but that is for later.  
> > 
> > Does this mean you can give me a "Tested-by" for that last patch?  
> 
> Here it is, just written down:
> 
> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
> 

Thanks!

-- Steve

^ permalink raw reply

* Re: [PATCH] [media] include/media: fix missing | operator when setting cfg
From: Sylwester Nawrocki @ 2018-04-18 15:27 UTC (permalink / raw)
  To: Colin Ian King
  Cc: Kyungmin Park, Mauro Carvalho Chehab, Kukjin Kim,
	Krzysztof Kozlowski, linux-media, linux-arm-kernel,
	linux-samsung-soc, kernel-janitors, linux-kernel
In-Reply-To: <c554a771-e9a9-fe1f-6792-e73f33b08838@canonical.com>

On 04/18/2018 05:24 PM, Colin Ian King wrote:
> Oops, shall I re-send?

There is no need to, thanks.

^ permalink raw reply

* [PATCH] media: cec: set ev rather than v with CEC_PIN_EVENT_FL_DROPPED bit
From: Colin King @ 2018-04-18 15:26 UTC (permalink / raw)
  To: Hans Verkuil, Mauro Carvalho Chehab, linux-media
  Cc: kernel-janitors, linux-kernel

From: Colin Ian King <colin.king@canonical.com>

Setting v with the CEC_PIN_EVENT_FL_DROPPED is incorrect, instead
ev should be set with this bit. Fix this.

Detected by CoverityScan, CID#1467974 ("Extra high-order bits")

Fixes: 6ec1cbf6b125 ("media: cec: improve CEC pin event handling")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
---
 drivers/media/cec/cec-pin.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/media/cec/cec-pin.c b/drivers/media/cec/cec-pin.c
index 2a5df99735fa..6e311424f0dc 100644
--- a/drivers/media/cec/cec-pin.c
+++ b/drivers/media/cec/cec-pin.c
@@ -119,7 +119,7 @@ static void cec_pin_update(struct cec_pin *pin, bool v, bool force)
 
 		if (pin->work_pin_events_dropped) {
 			pin->work_pin_events_dropped = false;
-			v |= CEC_PIN_EVENT_FL_DROPPED;
+			ev |= CEC_PIN_EVENT_FL_DROPPED;
 		}
 		pin->work_pin_events[pin->work_pin_events_wr] = ev;
 		pin->work_pin_ts[pin->work_pin_events_wr] = ktime_get();
-- 
2.17.0

^ permalink raw reply related

* Re: [PATCH v2] tracing/x86: Update syscall trace events to handle new x86 syscall func names
From: Arnaldo Carvalho de Melo @ 2018-04-18 15:25 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Dominik Brodowski, LKML, Ingo Molnar, Thomas Gleixner,
	H. Peter Anvin, x86, Wang Nan, Alexei Starovoitov,
	Daniel Borkmann
In-Reply-To: <20180418112033.65632fef@gandalf.local.home>

Em Wed, Apr 18, 2018 at 11:20:33AM -0400, Steven Rostedt escreveu:
> On Wed, 18 Apr 2018 12:17:16 -0300
> Arnaldo Carvalho de Melo <acme@kernel.org> wrote:
> 
> > This does the trick, by not using the main syscall routine, but one
> > called from it and not renamed, should work with older kernels.
> > 
> > This test should be improved to look if the desired routine is in place,
> > if not just skip the test and tell about the unavailability of the
> > wanted function, but that is for later.
> 
> Does this mean you can give me a "Tested-by" for that last patch?

Here it is, just written down:

Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>

- Arnaldo

^ permalink raw reply

* Re: [PATCH] [media] include/media: fix missing | operator when setting cfg
From: Colin Ian King @ 2018-04-18 15:24 UTC (permalink / raw)
  To: Sylwester Nawrocki
  Cc: Kyungmin Park, Mauro Carvalho Chehab, Kukjin Kim,
	Krzysztof Kozlowski, linux-media, linux-arm-kernel,
	linux-samsung-soc, kernel-janitors, linux-kernel
In-Reply-To: <bafbcf6c-a08d-11ba-af25-655b7cc44e1c@samsung.com>

On 18/04/18 16:23, Sylwester Nawrocki wrote:
> On 04/18/2018 05:20 PM, Sylwester Nawrocki wrote:
>> On 04/18/2018 05:06 PM, Colin King wrote:
>>> From: Colin Ian King <colin.king@canonical.com>
>>>
>>> The value from a readl is being masked with ITE_REG_CIOCAN_MASK however
>>> this is not being used and cfg is being re-assigned.  I believe the
>>> assignment operator should actually be instead the |= operator.
>>>
>>> Detected by CoverityScan, CID#1467987 ("Unused value")
>>>
>>> Signed-off-by: Colin Ian King <colin.king@canonical.com>
>> Thanks for the patch.
>>
>> Acked-by: Sylwester Nawrocki <s.nawrocki@samsung.com>
> 
> I forgot to mention that the subject should rather looks something
> like:
> 
> "exynos4-is: fimc-lite: : fix missing | operator when setting cfg"
> 
Oops, shall I re-send?

^ permalink raw reply

* Re: [PATCH] [media] include/media: fix missing | operator when setting cfg
From: Sylwester Nawrocki @ 2018-04-18 15:23 UTC (permalink / raw)
  To: Colin King
  Cc: Kyungmin Park, Mauro Carvalho Chehab, Kukjin Kim,
	Krzysztof Kozlowski, linux-media, linux-arm-kernel,
	linux-samsung-soc, kernel-janitors, linux-kernel
In-Reply-To: <ebce8e36-9125-aecb-b0d1-87f068646e67@samsung.com>

On 04/18/2018 05:20 PM, Sylwester Nawrocki wrote:
> On 04/18/2018 05:06 PM, Colin King wrote:
>> From: Colin Ian King <colin.king@canonical.com>
>>
>> The value from a readl is being masked with ITE_REG_CIOCAN_MASK however
>> this is not being used and cfg is being re-assigned.  I believe the
>> assignment operator should actually be instead the |= operator.
>>
>> Detected by CoverityScan, CID#1467987 ("Unused value")
>>
>> Signed-off-by: Colin Ian King <colin.king@canonical.com>
> Thanks for the patch.
> 
> Acked-by: Sylwester Nawrocki <s.nawrocki@samsung.com>

I forgot to mention that the subject should rather looks something
like:

"exynos4-is: fimc-lite: : fix missing | operator when setting cfg"

-- 
Regards,
Sylwester

^ permalink raw reply

* [PATCH net-next 2/2] netns: isolate seqnums to use per-netns locks
From: Christian Brauner @ 2018-04-18 15:21 UTC (permalink / raw)
  To: ebiederm, davem, netdev, linux-kernel
  Cc: avagin, ktkhai, serge, gregkh, Christian Brauner
In-Reply-To: <20180418152106.18519-1-christian.brauner@ubuntu.com>

Now that it's possible to have a different set of uevents in different
network namespaces, per-network namespace uevent sequence numbers are
introduced. This increases performance as locking is now restricted to the
network namespace affected by the uevent rather than locking everything.

Since commit 692ec06 ("netns: send uevent messages") network namespaces not
owned by the intial user namespace can be sent uevents from a sufficiently
privileged userspace process.
In order to send a uevent into a network namespace not owned by the initial
user namespace we currently still need to take the *global mutex* that
locks the uevent socket list even though the list *only contains network
namespaces owned by the initial user namespace*. This needs to be done
because the uevent counter is a global variable. Taking the global lock is
performance sensitive since a user on the host can spawn a pool of n
process that each create their own new user and network namespaces and then
go on to inject uevents in parallel into the network namespace of all of
these processes. This can have a significant performance impact for the
host's udevd since it means that there can be a lot of delay between a
device being added and the corresponding uevent being sent out and
available for processing by udevd. It also means that each network
namespace not owned by the initial user namespace which userspace has sent
a uevent to will need to wait until the lock becomes available.

Implementation:
This patch gives each network namespace its own uevent sequence number.
Each network namespace not owned by the initial user namespace receives its
own mutex. The struct uevent_sock is opaque to callers outside of kobject.c
so the mutex *can* and *is* only ever accessed in lib/kobject.c. In this
file it is clearly documented which lock has to be taken. All network
namespaces owned by the initial user namespace will still share the same
lock since they are all served sequentially via the uevent socket list.
This decouples the locking and ensures that the host retrieves uevents as
fast as possible even if there are a lot of uevents injected into network
namespaces not owned by the initial user namespace.  In addition, each
network namespace not owned by the initial user namespace does not have to
wait on any other network namespace not sharing the same user namespace.

Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
---
 include/linux/kobject.h     |   3 --
 include/net/net_namespace.h |   3 ++
 kernel/ksysfs.c             |   3 +-
 lib/kobject_uevent.c        | 100 ++++++++++++++++++++++++++++--------
 net/core/net_namespace.c    |  13 +++++
 5 files changed, 98 insertions(+), 24 deletions(-)

diff --git a/include/linux/kobject.h b/include/linux/kobject.h
index 7f6f93c3df9c..776391aea247 100644
--- a/include/linux/kobject.h
+++ b/include/linux/kobject.h
@@ -36,9 +36,6 @@
 extern char uevent_helper[];
 #endif
 
-/* counter to tag the uevent, read only except for the kobject core */
-extern u64 uevent_seqnum;
-
 /*
  * The actions here must match the index to the string array
  * in lib/kobject_uevent.c
diff --git a/include/net/net_namespace.h b/include/net/net_namespace.h
index 47e35cce3b64..e4e171b1ba69 100644
--- a/include/net/net_namespace.h
+++ b/include/net/net_namespace.h
@@ -85,6 +85,8 @@ struct net {
 	struct sock		*genl_sock;
 
 	struct uevent_sock	*uevent_sock;		/* uevent socket */
+	/* counter to tag the uevent, read only except for the kobject core */
+	u64                     uevent_seqnum;
 
 	struct list_head 	dev_base_head;
 	struct hlist_head 	*dev_name_head;
@@ -189,6 +191,7 @@ extern struct list_head net_namespace_list;
 
 struct net *get_net_ns_by_pid(pid_t pid);
 struct net *get_net_ns_by_fd(int fd);
+u64 get_ns_uevent_seqnum_by_vpid(void);
 
 #ifdef CONFIG_SYSCTL
 void ipx_register_sysctl(void);
diff --git a/kernel/ksysfs.c b/kernel/ksysfs.c
index 46ba853656f6..83264edcecda 100644
--- a/kernel/ksysfs.c
+++ b/kernel/ksysfs.c
@@ -19,6 +19,7 @@
 #include <linux/sched.h>
 #include <linux/capability.h>
 #include <linux/compiler.h>
+#include <net/net_namespace.h>
 
 #include <linux/rcupdate.h>	/* rcu_expedited and rcu_normal */
 
@@ -33,7 +34,7 @@ static struct kobj_attribute _name##_attr = \
 static ssize_t uevent_seqnum_show(struct kobject *kobj,
 				  struct kobj_attribute *attr, char *buf)
 {
-	return sprintf(buf, "%llu\n", (unsigned long long)uevent_seqnum);
+	return sprintf(buf, "%llu\n", (unsigned long long)get_ns_uevent_seqnum_by_vpid());
 }
 KERNEL_ATTR_RO(uevent_seqnum);
 
diff --git a/lib/kobject_uevent.c b/lib/kobject_uevent.c
index f5f5038787ac..796fd502c227 100644
--- a/lib/kobject_uevent.c
+++ b/lib/kobject_uevent.c
@@ -29,21 +29,38 @@
 #include <net/net_namespace.h>
 
 
-u64 uevent_seqnum;
 #ifdef CONFIG_UEVENT_HELPER
 char uevent_helper[UEVENT_HELPER_PATH_LEN] = CONFIG_UEVENT_HELPER_PATH;
 #endif
 
+/*
+ * Size a buffer needs to be in order to hold the largest possible sequence
+ * number stored in a u64 including \0 byte: 2^64 - 1 = 21 chars.
+ */
+#define SEQNUM_BUFSIZE (sizeof("SEQNUM=") + 21)
 struct uevent_sock {
 	struct list_head list;
 	struct sock *sk;
+	/*
+	 * This mutex protects uevent sockets and the uevent counter of
+	 * network namespaces *not* owned by init_user_ns.
+	 * For network namespaces owned by init_user_ns this lock is *not*
+	 * valid instead the global uevent_sock_mutex must be used!
+	 */
+	struct mutex sk_mutex;
 };
 
 #ifdef CONFIG_NET
 static LIST_HEAD(uevent_sock_list);
 #endif
 
-/* This lock protects uevent_seqnum and uevent_sock_list */
+/*
+ * This mutex protects uevent sockets and the uevent counter of network
+ * namespaces owned by init_user_ns.
+ * For network namespaces not owned by init_user_ns this lock is *not*
+ * valid instead the network namespace specific sk_mutex in struct
+ * uevent_sock must be used!
+ */
 static DEFINE_MUTEX(uevent_sock_mutex);
 
 /* the strings here must match the enum in include/linux/kobject.h */
@@ -253,6 +270,22 @@ static int kobj_bcast_filter(struct sock *dsk, struct sk_buff *skb, void *data)
 
 	return 0;
 }
+
+static bool can_hold_seqnum(const struct kobj_uevent_env *env, size_t len)
+{
+	if (env->envp_idx >= ARRAY_SIZE(env->envp)) {
+		WARN(1, KERN_ERR "Failed to append sequence number. "
+		     "Too many uevent variables\n");
+		return false;
+	}
+
+	if ((env->buflen + len) > UEVENT_BUFFER_SIZE) {
+		WARN(1, KERN_ERR "Insufficient space to append sequence number\n");
+		return false;
+	}
+
+	return true;
+}
 #endif
 
 #ifdef CONFIG_UEVENT_HELPER
@@ -308,18 +341,22 @@ static int kobject_uevent_net_broadcast(struct kobject *kobj,
 
 	/* send netlink message */
 	list_for_each_entry(ue_sk, &uevent_sock_list, list) {
+		/* bump sequence number */
+		u64 seqnum = ++sock_net(ue_sk->sk)->uevent_seqnum;
 		struct sock *uevent_sock = ue_sk->sk;
+		char buf[SEQNUM_BUFSIZE];
 
 		if (!netlink_has_listeners(uevent_sock, 1))
 			continue;
 
 		if (!skb) {
-			/* allocate message with the maximum possible size */
+			/* calculate header length */
 			size_t len = strlen(action_string) + strlen(devpath) + 2;
 			char *scratch;
 
+			/* allocate message with the maximum possible size */
 			retval = -ENOMEM;
-			skb = alloc_skb(len + env->buflen, GFP_KERNEL);
+			skb = alloc_skb(len + env->buflen + SEQNUM_BUFSIZE, GFP_KERNEL);
 			if (!skb)
 				continue;
 
@@ -327,11 +364,24 @@ static int kobject_uevent_net_broadcast(struct kobject *kobj,
 			scratch = skb_put(skb, len);
 			sprintf(scratch, "%s@%s", action_string, devpath);
 
+			/* add env */
 			skb_put_data(skb, env->buf, env->buflen);
 
 			NETLINK_CB(skb).dst_group = 1;
 		}
 
+		/* prepare netns seqnum */
+		retval = snprintf(buf, SEQNUM_BUFSIZE, "SEQNUM=%llu", seqnum);
+		if (retval < 0 || retval >= SEQNUM_BUFSIZE)
+			continue;
+		retval++;
+
+		if (!can_hold_seqnum(env, retval))
+			continue;
+
+		/* append netns seqnum */
+		skb_put_data(skb, buf, retval);
+
 		retval = netlink_broadcast_filtered(uevent_sock, skb_get(skb),
 						    0, 1, GFP_KERNEL,
 						    kobj_bcast_filter,
@@ -339,6 +389,9 @@ static int kobject_uevent_net_broadcast(struct kobject *kobj,
 		/* ENOBUFS should be handled in userspace */
 		if (retval == -ENOBUFS || retval == -ESRCH)
 			retval = 0;
+
+		/* remove netns seqnum */
+		skb_trim(skb, env->buflen);
 	}
 	consume_skb(skb);
 #endif
@@ -510,14 +563,7 @@ int kobject_uevent_env(struct kobject *kobj, enum kobject_action action,
 	}
 
 	mutex_lock(&uevent_sock_mutex);
-	/* we will send an event, so request a new sequence number */
-	retval = add_uevent_var(env, "SEQNUM=%llu", (unsigned long long)++uevent_seqnum);
-	if (retval) {
-		mutex_unlock(&uevent_sock_mutex);
-		goto exit;
-	}
-	retval = kobject_uevent_net_broadcast(kobj, env, action_string,
-					      devpath);
+	retval = kobject_uevent_net_broadcast(kobj, env, action_string, devpath);
 	mutex_unlock(&uevent_sock_mutex);
 
 #ifdef CONFIG_UEVENT_HELPER
@@ -605,17 +651,18 @@ int add_uevent_var(struct kobj_uevent_env *env, const char *format, ...)
 EXPORT_SYMBOL_GPL(add_uevent_var);
 
 #if defined(CONFIG_NET)
-static int uevent_net_broadcast(struct sock *usk, struct sk_buff *skb,
+static int uevent_net_broadcast(struct uevent_sock *ue_sk, struct sk_buff *skb,
 				struct netlink_ext_ack *extack)
 {
-	/* u64 to chars: 2^64 - 1 = 21 chars */
-	char buf[sizeof("SEQNUM=") + 21];
+	struct sock *usk = ue_sk->sk;
+	char buf[SEQNUM_BUFSIZE];
 	struct sk_buff *skbc;
 	int ret;
 
 	/* bump and prepare sequence number */
-	ret = snprintf(buf, sizeof(buf), "SEQNUM=%llu", ++uevent_seqnum);
-	if (ret < 0 || (size_t)ret >= sizeof(buf))
+	ret = snprintf(buf, SEQNUM_BUFSIZE, "SEQNUM=%llu",
+		       ++sock_net(ue_sk->sk)->uevent_seqnum);
+	if (ret < 0 || ret >= SEQNUM_BUFSIZE)
 		return -ENOMEM;
 	ret++;
 
@@ -668,9 +715,15 @@ static int uevent_net_rcv_skb(struct sk_buff *skb, struct nlmsghdr *nlh,
 		return -EPERM;
 	}
 
-	mutex_lock(&uevent_sock_mutex);
-	ret = uevent_net_broadcast(net->uevent_sock->sk, skb, extack);
-	mutex_unlock(&uevent_sock_mutex);
+	if (net->user_ns == &init_user_ns)
+		mutex_lock(&uevent_sock_mutex);
+	else
+		mutex_lock(&net->uevent_sock->sk_mutex);
+	ret = uevent_net_broadcast(net->uevent_sock, skb, extack);
+	if (net->user_ns == &init_user_ns)
+		mutex_unlock(&uevent_sock_mutex);
+	else
+		mutex_unlock(&net->uevent_sock->sk_mutex);
 
 	return ret;
 }
@@ -708,6 +761,13 @@ static int uevent_net_init(struct net *net)
 		mutex_lock(&uevent_sock_mutex);
 		list_add_tail(&ue_sk->list, &uevent_sock_list);
 		mutex_unlock(&uevent_sock_mutex);
+	} else {
+		/*
+		 * Uevent sockets and counters for network namespaces
+		 * not owned by the initial user namespace have their
+		 * own mutex.
+		 */
+		mutex_init(&ue_sk->sk_mutex);
 	}
 
 	return 0;
diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c
index a11e03f920d3..2f914804ef73 100644
--- a/net/core/net_namespace.c
+++ b/net/core/net_namespace.c
@@ -618,6 +618,19 @@ struct net *get_net_ns_by_pid(pid_t pid)
 }
 EXPORT_SYMBOL_GPL(get_net_ns_by_pid);
 
+u64 get_ns_uevent_seqnum_by_vpid(void)
+{
+	pid_t cur_pid;
+	struct net *net;
+
+	cur_pid = task_pid_vnr(current);
+	net = get_net_ns_by_pid(cur_pid);
+	if (IS_ERR(net))
+		return 0;
+
+	return net->uevent_seqnum;
+}
+
 static __net_init int net_ns_net_init(struct net *net)
 {
 #ifdef CONFIG_NET_NS
-- 
2.17.0

^ permalink raw reply related

* [PATCH net-next 1/2] netns: restrict uevents
From: Christian Brauner @ 2018-04-18 15:21 UTC (permalink / raw)
  To: ebiederm, davem, netdev, linux-kernel
  Cc: avagin, ktkhai, serge, gregkh, Christian Brauner
In-Reply-To: <20180418152106.18519-1-christian.brauner@ubuntu.com>

commit 07e98962fa77 ("kobject: Send hotplug events in all network namespaces")

enabled sending hotplug events into all network namespaces back in 2010.
Over time the set of uevents that get sent into all network namespaces has
shrunk a little. We have now reached the point where hotplug events for all
devices that carry a namespace tag are filtered according to that
namespace. Specifically, they are filtered whenever the namespace tag of
the kobject does not match the namespace tag of the netlink socket. One
example are network devices. Uevents for network devices only show up in
the network namespaces these devices are moved to or created in.

However, any uevent for a kobject that does not have a namespace tag
associated with it will not be filtered and we will broadcast it into all
network namespaces. This behavior stopped making sense when user namespaces
were introduced.

This patch restricts uevents to the initial user namespace for a couple of
reasons that have been extensively discusses on the mailing list [1].
- Thundering herd:
  Broadcasting uevents into all network namespaces introduces significant
  overhead.
  All processes that listen to uevents running in non-initial user
  namespaces will end up responding to uevents that will be meaningless to
  them. Mainly, because non-initial user namespaces cannot easily manage
  devices unless they have a privileged host-process helping them out. This
  means that there will be a thundering herd of activity when there
  shouldn't be any.
- Uevents from non-root users are already filtered in userspace:
  Uevents are filtered by userspace in a user namespace because the
  received uid != 0. Instead the uid associated with the event will be
  65534 == "nobody" because the global root uid is not mapped.
  This means we can safely and without introducing regressions modify the
  kernel to not send uevents into all network namespaces whose owning user
  namespace is not the initial user namespace because we know that
  userspace will ignore the message because of the uid anyway. I have
  a) verified that is is true for every udev implementation out there b)
  that this behavior has been present in all udev implementations from the
  very beginning.
- Removing needless overhead/Increasing performance:
  Currently, the uevent socket for each network namespace is added to the
  global variable uevent_sock_list. The list itself needs to be protected
  by a mutex. So everytime a uevent is generated the mutex is taken on the
  list. The mutex is held *from the creation of the uevent (memory
  allocation, string creation etc. until all uevent sockets have been
  handled*. This is aggravated by the fact that for each uevent socket that
  has listeners the mc_list must be walked as well which means we're
  talking O(n^2) here. Given that a standard Linux workload usually has
  quite a lot of network namespaces and - in the face of containers - a lot
  of user namespaces this quickly becomes a performance problem (see
  "Thundering herd" above). By just recording uevent sockets of network
  namespaces that are owned by the initial user namespace we significantly
  increase performance in this codepath.
- Injecting uevents:
  There's a valid argument that containers might be interested in receiving
  device events especially if they are delegated to them by a privileged
  userspace process. One prime example are SR-IOV enabled devices that are
  explicitly designed to be handed of to other users such as VMs or
  containers.
  This use-case can now be correctly handled since
  commit 692ec06d7c92 ("netns: send uevent messages"). This commit
  introduced the ability to send uevents from userspace. As such we can let
  a sufficiently privileged (CAP_SYS_ADMIN in the owning user namespace of
  the network namespace of the netlink socket) userspace process make a
  decision what uevents should be sent. This removes the need to blindly
  broadcast uevents into all user namespaces and provides a performant and
  safe solution to this problem.
- Filtering logic:
  This patch filters by *owning user namespace of the network namespace a
  given task resides in* and not by user namespace of the task per se. This
  means if the user namespace of a given task is unshared but the network
  namespace is kept and is owned by the initial user namespace a listener
  that is opening the uevent socket in that network namespace can still
  listen to uevents.

[1]: https://lkml.org/lkml/2018/4/4/739
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
---
 lib/kobject_uevent.c | 18 ++++++++++++------
 1 file changed, 12 insertions(+), 6 deletions(-)

diff --git a/lib/kobject_uevent.c b/lib/kobject_uevent.c
index 15ea216a67ce..f5f5038787ac 100644
--- a/lib/kobject_uevent.c
+++ b/lib/kobject_uevent.c
@@ -703,9 +703,13 @@ static int uevent_net_init(struct net *net)
 
 	net->uevent_sock = ue_sk;
 
-	mutex_lock(&uevent_sock_mutex);
-	list_add_tail(&ue_sk->list, &uevent_sock_list);
-	mutex_unlock(&uevent_sock_mutex);
+	/* Restrict uevents to initial user namespace. */
+	if (sock_net(ue_sk->sk)->user_ns == &init_user_ns) {
+		mutex_lock(&uevent_sock_mutex);
+		list_add_tail(&ue_sk->list, &uevent_sock_list);
+		mutex_unlock(&uevent_sock_mutex);
+	}
+
 	return 0;
 }
 
@@ -713,9 +717,11 @@ static void uevent_net_exit(struct net *net)
 {
 	struct uevent_sock *ue_sk = net->uevent_sock;
 
-	mutex_lock(&uevent_sock_mutex);
-	list_del(&ue_sk->list);
-	mutex_unlock(&uevent_sock_mutex);
+	if (sock_net(ue_sk->sk)->user_ns == &init_user_ns) {
+		mutex_lock(&uevent_sock_mutex);
+		list_del(&ue_sk->list);
+		mutex_unlock(&uevent_sock_mutex);
+	}
 
 	netlink_kernel_release(ue_sk->sk);
 	kfree(ue_sk);
-- 
2.17.0

^ permalink raw reply related

* [PATCH net-next 0/2] netns: uevent performance tweaks
From: Christian Brauner @ 2018-04-18 15:21 UTC (permalink / raw)
  To: ebiederm, davem, netdev, linux-kernel
  Cc: avagin, ktkhai, serge, gregkh, Christian Brauner

Hey,

This series deals with a bunch of performance improvements when sending out
uevents that have been extensively discussed here:
https://lkml.org/lkml/2018/4/10/592

- Only record uevent sockets from network namespaces owned by the
  initial user namespace in the global uevent socket list.
  Eric, this is the exact patch we agreed upon in
  https://lkml.org/lkml/2018/4/10/592.
  **A very detailed rationale is present in the commit message for
    [PATCH 1/2] netns: restrict uevents**
- Decouple the locking for network namespaces in the global uevent socket
  list from the locking for network namespaces not in the global uevent
  socket list.
  **A very detailed rationale is present in the commit message
    [PATCH 2/2] netns: isolate seqnums to use per-netns locks**

Thanks!
Christian

Christian Brauner (2):
  netns: restrict uevents
  netns: isolate seqnums to use per-netns locks

 include/linux/kobject.h     |   3 -
 include/net/net_namespace.h |   3 +
 kernel/ksysfs.c             |   3 +-
 lib/kobject_uevent.c        | 118 ++++++++++++++++++++++++++++--------
 net/core/net_namespace.c    |  13 ++++
 5 files changed, 110 insertions(+), 30 deletions(-)

-- 
2.17.0

^ permalink raw reply

* Re: [PATCH v2] tracing/x86: Update syscall trace events to handle new x86 syscall func names
From: Steven Rostedt @ 2018-04-18 15:20 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Dominik Brodowski, LKML, Ingo Molnar, Thomas Gleixner,
	H. Peter Anvin, x86, Wang Nan, Alexei Starovoitov,
	Daniel Borkmann
In-Reply-To: <20180418151716.GC10084@kernel.org>

On Wed, 18 Apr 2018 12:17:16 -0300
Arnaldo Carvalho de Melo <acme@kernel.org> wrote:

> This does the trick, by not using the main syscall routine, but one
> called from it and not renamed, should work with older kernels.
> 
> This test should be improved to look if the desired routine is in place,
> if not just skip the test and tell about the unavailability of the
> wanted function, but that is for later.

Does this mean you can give me a "Tested-by" for that last patch?

-- Steve

^ permalink raw reply

* Re: [PATCH] [media] include/media: fix missing | operator when setting cfg
From: Sylwester Nawrocki @ 2018-04-18 15:20 UTC (permalink / raw)
  To: Colin King
  Cc: Kyungmin Park, Mauro Carvalho Chehab, Kukjin Kim,
	Krzysztof Kozlowski, linux-media, linux-arm-kernel,
	linux-samsung-soc, kernel-janitors, linux-kernel
In-Reply-To: <20180418150617.22489-1-colin.king@canonical.com>

On 04/18/2018 05:06 PM, Colin King wrote:
> From: Colin Ian King <colin.king@canonical.com>
> 
> The value from a readl is being masked with ITE_REG_CIOCAN_MASK however
> this is not being used and cfg is being re-assigned.  I believe the
> assignment operator should actually be instead the |= operator.
> 
> Detected by CoverityScan, CID#1467987 ("Unused value")
> 
> Signed-off-by: Colin Ian King <colin.king@canonical.com>

Thanks for the patch.

Acked-by: Sylwester Nawrocki <s.nawrocki@samsung.com>

^ permalink raw reply

* Re: [PATCH 1/2] tracing: fix bad use of igrab in trace_uprobe.c
From: Steven Rostedt @ 2018-04-18 15:19 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Song Liu, LKML, kernel-team, Ingo Molnar, Howard McLauchlan,
	Josef Bacik, Srikar Dronamraju
In-Reply-To: <CAJfpegtKsMBjqfg6Ewz3o9197GyqS6wTauuuMExj3DDZU+CPFA@mail.gmail.com>

On Wed, 18 Apr 2018 16:40:19 +0200
Miklos Szeredi <miklos@szeredi.hu> wrote:

> > The trace_uprobe (the probe event) is created, it doesn't do anything
> > until it is enabled. This function is called when it is enabled. The
> > trace_uprobe (probe event) can not be deleted while it is enabled
> > (EBUSY).
> >
> > Are you asking what happens if the file is deleted while it has probe?
> > That I don't know about (haven't tried it out). But I would hope that
> > it keeps a reference to the inode, isn't that what the igrab is for?
> > And is now being replaced by a reference on the path, or is that the
> > problem?  
> 
> No, that's not the problem.
> 
> What I don't see is how the uprobe object relates to the trace_uprobe object.
> 
> Because after the patch the uprobe object still only has a ref to the
> inode, and that can lead to the same issue as with trace_uprobe.
> OTOH if uprobe can't survive its creating trace_uprobe, then it
> doesn't need to take a ref to the inode at all, since trace_uprobe
> already holds it.   Taking an extra ref isn't incorrect, it's just
> unnecessary and confusing.
> 
> So this needs to be cleared up in some way.

The uprobe created by the trace_uprobe creation must be deleted before
the trace_uprobe can be deleted. Basically we have this:

 # cd /sys/kernel/tracing
 # echo "uprobe creation text" > uprobe_events

The trace_uprobe is created (but not the uprobe itself). This is what
calls create_trace_uprobe().

 # echo 1 > events/uprobes/enable

This enables all the trace uprobe events, which creates the uprobes.
This is the action that calls probe_event_enable(), which creates
uprobes.

At this point, any write to uprobe_events that would destroy the trace
uprobes would return with -EBUSY, and the trace uprobes will not be
deleted.

 # echo 0 > events/uprobes/enable

This will call the probe_event_disable() which will call
uprobe_unregister() which will destroy the uprobe.

Now we can delete the trace uprobe.

Does that answer your question? A uprobe created for trace uprobes can
not survive the trace uprobe itself.

-- Steve

^ permalink raw reply

* Re: [PATCH v2] tracing/x86: Update syscall trace events to handle new x86 syscall func names
From: Arnaldo Carvalho de Melo @ 2018-04-18 15:17 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Dominik Brodowski, LKML, Ingo Molnar, Thomas Gleixner,
	H. Peter Anvin, x86, Wang Nan, Alexei Starovoitov,
	Daniel Borkmann
In-Reply-To: <20180418150212.GA10084@kernel.org>

Em Wed, Apr 18, 2018 at 12:02:12PM -0300, Arnaldo Carvalho de Melo escreveu:
> Em Wed, Apr 18, 2018 at 10:36:06AM -0400, Steven Rostedt escreveu:
> > On Wed, 18 Apr 2018 09:53:22 -0300
> > Arnaldo Carvalho de Melo <acme@kernel.org> wrote:
> > > Em Tue, Apr 17, 2018 at 05:41:28PM -0400, Steven Rostedt escreveu:
> > > > On Tue, 17 Apr 2018 15:13:04 -0300 Arnaldo Carvalho de Melo <acme@kernel.org> wrote:  
> > > > > Yeah, failing:  
> 
> > > > > [root@jouet ~]# strace -e openat -e file perf test -F -v "mmap interface" |& grep syscalls
> > > > > openat(AT_FDCWD, "/sys/kernel/debug/tracing/events/syscalls/sys_enter_getsid/format", O_RDONLY) = 3
> > > > > openat(AT_FDCWD, "/sys/kernel/debug/tracing/events/syscalls/sys_enter_getppid/format", O_RDONLY) = -1 ENOENT (No such file or directory)  
> > >  
> > > > It doesn't have to do with the number of parameters, not everything
> > > > has "__x64" on it.  
> 
> > > > Try this patch:  
> 
> > > Trying...
>  
> > You're keeping me in suspense!
> 
> I switched locations, had trouble reconnecting, those tests are ok now,
> there is just one case left, related to the syscall routines renames,
> but not related to the syscalls:sys_{enter,exit}_NAME tracepoints:
> 40: BPF filter                                            :
> 40.1: Basic BPF filtering                                 : FAILED!
> 40.2: BPF pinning                                         : Skip
> 40.3: BPF prologue generation                             : Skip
> 40.4: BPF relocation checker                              : Skip
> 
> If we use -v for that test we see the problem:
> 
> To the point:
> 
>   Probe point 'SyS_epoll_pwait' not found.
> 
> This is not there anymore, I'll change this test to first figure out
> what is the syscall routine for the epoll_pwait syscall so that it works
> with pre-syscall-routines-rename and after that.

This does the trick, by not using the main syscall routine, but one
called from it and not renamed, should work with older kernels.

This test should be improved to look if the desired routine is in place,
if not just skip the test and tell about the unavailability of the
wanted function, but that is for later.

- Arnaldo

diff --git a/tools/perf/tests/bpf-script-example.c b/tools/perf/tests/bpf-script-example.c
index e4123c1b0e88..1ca5106df5f1 100644
--- a/tools/perf/tests/bpf-script-example.c
+++ b/tools/perf/tests/bpf-script-example.c
@@ -31,7 +31,7 @@ struct bpf_map_def SEC("maps") flip_table = {
 	.max_entries = 1,
 };
 
-SEC("func=SyS_epoll_pwait")
+SEC("func=do_epoll_wait")
 int bpf_func__SyS_epoll_pwait(void *ctx)
 {
 	int ind =0;

^ permalink raw reply related

* Re: [PATCH v2 0/5] ALSA: xen-front: Add Xen para-virtualized frontend driver
From: Oleksandr Andrushchenko @ 2018-04-18 15:15 UTC (permalink / raw)
  To: xen-devel, linux-kernel, alsa-devel, jgross, boris.ostrovsky,
	konrad.wilk, perex, tiwai
  Cc: Oleksandr Andrushchenko
In-Reply-To: <20180416062453.24743-1-andr2000@gmail.com>

On 04/16/2018 09:24 AM, Oleksandr Andrushchenko wrote:
> From: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
>
> Please note: this patch series depends on [3].
The dependency is now merged into Xen kernel tree [4] for-linus-4.17
>
> This patch series adds support for Xen [1] para-virtualized
> sound frontend driver. It implements the protocol from
> include/xen/interface/io/sndif.h with the following limitations:
> - mute/unmute is not supported
> - get/set volume is not supported
> Volume control is not supported for the reason that most of the
> use-cases (at the moment) are based on scenarious where
> unprivileged OS (e.g. Android, AGL etc) use software mixers.
>
> Both capture and playback are supported.
>
> Corresponding backend, implemented as a user-space application, can be
> found at [2].
>
> Thank you,
> Oleksandr
>
> Changes since v1:
> *****************
>
> 1. Moved driver from sound/drivers to sound/xen
>
> 2. Coding style changes to better meet Linux Kernel
>
> 3. Added explicit back and front synchronization
>     In order to provide explicit synchronization between backend and
>     frontend the following changes are introduced in the protocol:
>      - add new ring buffer for sending asynchronous events from
>        backend to frontend to report number of bytes played by the
>        frontend (XENSND_EVT_CUR_POS)
>      - introduce trigger events for playback control: start/stop/pause/resume
>      - add "req-" prefix to event-channel and ring-ref to unify naming
>        of the Xen event channels for requests and events
>
> 4. Added explicit back and front parameter negotiation
>     In order to provide explicit stream parameter negotiation between
>     backend and frontend the following changes are introduced in the protocol:
>     add XENSND_OP_HW_PARAM_QUERY request to read/update
>     configuration space for the parameters given: request passes
>     desired parameter's intervals/masks and the response to this request
>     returns allowed min/max intervals/masks to be used.
>
> [1] https://xenproject.org/
> [2] https://github.com/xen-troops/snd_be
> [3] https://lkml.org/lkml/2018/4/12/522
>
> Oleksandr Andrushchenko (5):
>    ALSA: xen-front: Introduce Xen para-virtualized sound frontend driver
>    ALSA: xen-front: Read sound driver configuration from Xen store
>    ALSA: xen-front: Implement Xen event channel handling
>    ALSA: xen-front: Implement handling of shared buffers
>    ALSA: xen-front: Implement ALSA virtual sound driver
>
>   sound/Kconfig                     |   2 +
>   sound/Makefile                    |   2 +-
>   sound/xen/Kconfig                 |  10 +
>   sound/xen/Makefile                |   9 +
>   sound/xen/xen_snd_front.c         | 410 +++++++++++++++++++
>   sound/xen/xen_snd_front.h         |  57 +++
>   sound/xen/xen_snd_front_alsa.c    | 830 ++++++++++++++++++++++++++++++++++++++
>   sound/xen/xen_snd_front_alsa.h    |  23 ++
>   sound/xen/xen_snd_front_cfg.c     | 517 ++++++++++++++++++++++++
>   sound/xen/xen_snd_front_cfg.h     |  46 +++
>   sound/xen/xen_snd_front_evtchnl.c | 478 ++++++++++++++++++++++
>   sound/xen/xen_snd_front_evtchnl.h |  92 +++++
>   sound/xen/xen_snd_front_shbuf.c   | 193 +++++++++
>   sound/xen/xen_snd_front_shbuf.h   |  36 ++
>   14 files changed, 2704 insertions(+), 1 deletion(-)
>   create mode 100644 sound/xen/Kconfig
>   create mode 100644 sound/xen/Makefile
>   create mode 100644 sound/xen/xen_snd_front.c
>   create mode 100644 sound/xen/xen_snd_front.h
>   create mode 100644 sound/xen/xen_snd_front_alsa.c
>   create mode 100644 sound/xen/xen_snd_front_alsa.h
>   create mode 100644 sound/xen/xen_snd_front_cfg.c
>   create mode 100644 sound/xen/xen_snd_front_cfg.h
>   create mode 100644 sound/xen/xen_snd_front_evtchnl.c
>   create mode 100644 sound/xen/xen_snd_front_evtchnl.h
>   create mode 100644 sound/xen/xen_snd_front_shbuf.c
>   create mode 100644 sound/xen/xen_snd_front_shbuf.h
>
[4] 
https://git.kernel.org/pub/scm/linux/kernel/git/xen/tip.git/commit/?h=for-linus-4.17&id=cd6e992b3aab072cc90839508aaf5573c8f7e066

^ permalink raw reply

* Re: [PATCH] powerpc: Allow selection of CONFIG_LD_DEAD_CODE_DATA_ELIMINATION
From: Michael Ellerman @ 2018-04-18 15:13 UTC (permalink / raw)
  To: Mathieu Malaterre, Christophe LEROY
  Cc: Benjamin Herrenschmidt, Paul Mackerras, linuxppc-dev, LKML
In-Reply-To: <CA+7wUszJzG4CyfO2_Tvmc_u9oMPxLQMaiP+HnhpEex6KCRgSYw@mail.gmail.com>

Mathieu Malaterre <malat@debian.org> writes:
> On Wed, Apr 18, 2018 at 8:34 AM, Christophe LEROY
...
>
>> Can you also provide a copy of the messages you can see (prom_init ...) when
>> boot is ok ?
>
> Hum. I've always been interested in seeing it also myself. Is there a
> way to setup env to see those message (netconsole, delayed boot
> messages ...) ? I never found a clear documentation on how to do that
> on (closed) Apple hardware.

If you see nothing after prom_init it usually indicates the kernel died
very early in boot before it could find the console.

The only option then is to enable one of the hard-coded EARLY_DEBUG
options.

I don't know which one works on a G4, maybe CONFIG_PPC_EARLY_DEBUG_BOOTX ?

I assume it doesn't have a serial port.

cheers

^ permalink raw reply

* Re: [PATCH] SLUB: Do not fallback to mininum order if __GFP_NORETRY is set
From: Christopher Lameter @ 2018-04-18 15:11 UTC (permalink / raw)
  To: Mikulas Patocka
  Cc: Vlastimil Babka, Mike Snitzer, Matthew Wilcox, Pekka Enberg,
	linux-mm, dm-devel, David Rientjes, Joonsoo Kim, Andrew Morton,
	linux-kernel
In-Reply-To: <alpine.LRH.2.02.1804181102490.13213@file01.intranet.prod.int.rdu2.redhat.com>

On Wed, 18 Apr 2018, Mikulas Patocka wrote:

> No, this would hit NULL pointer dereference if page is NULL and
> __GFP_NORETRY is set. You want this:

You are right

Acked-by: Christoph Lameter <cl@linux.com>

^ permalink raw reply

* Re: [PATCH 2/2] printk: wake up klogd in vprintk_emit
From: Steven Rostedt @ 2018-04-18 15:10 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sergey Senozhatsky, Andrew Morton, Peter Zijlstra, Tejun Heo,
	linux-kernel
In-Reply-To: <20180418150214.z7oyughldrktj6e4@pathway.suse.cz>

On Wed, 18 Apr 2018 17:02:14 +0200
Petr Mladek <pmladek@suse.com> wrote:

> > Calling wake_up_klogd() will grab the rq lock and give us a A-B<->B-A
> > locking order.  
> 
> wake_up_klogd() uses the lockless irq_work_queue(). So it is actually
> safe.

I didn't look at the code. OK then we don't need to worry about that.

> 
> But the name is confusing. We should rename it.

Yes, I would because the old wake_up_klogd() did do a wakeup. Perhaps
we should name it: kick_klogd().

-- Steve

^ permalink raw reply

* Re: [RFC] perf/core: what is exclude_idle supposed to do
From: Vince Weaver @ 2018-04-18 15:10 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Stephane Eranian, LKML, Peter Zijlstra, Arnaldo Carvalho de Melo,
	mingo, Andi Kleen, Vince Weaver
In-Reply-To: <20180417062010.GA2052@krava>

On Tue, 17 Apr 2018, Jiri Olsa wrote:

> On Mon, Apr 16, 2018 at 10:04:53PM +0000, Stephane Eranian wrote:
> > Hi,
> > 
> > I am trying to understand what the exclude_idle event attribute is supposed
> > to accomplish.
> > As per the definition in the header file:
> > 
> >     exclude_idle   :  1, /* don't count when idle */
> 
> AFAICS it's not implemented

so just to be completely clear hear, we're saying that the "exclude_idle" 
modifier has never done anything useful and still doesn't?

If so I should update the perf_event_open manpage to spell this out.

Vince

^ permalink raw reply

* Re: [PATCH net-next v4 0/3] kernel: add support to collect hardware logs in crash recovery kernel
From: Rahul Lakkireddy @ 2018-04-18 15:07 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Dave Young, netdev@vger.kernel.org, kexec@lists.infradead.org,
	linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org,
	Indranil Choudhury, Nirranjan Kirubaharan,
	stephen@networkplumber.org, Ganesh GR, akpm@linux-foundation.org,
	torvalds@linux-foundation.org, davem@davemloft.net,
	viro@zeniv.linux.org.uk
In-Reply-To: <871sfcy4ge.fsf@xmission.com>

On Wednesday, April 04/18/18, 2018 at 19:58:01 +0530, Eric W. Biederman wrote:
> Rahul Lakkireddy <rahul.lakkireddy@chelsio.com> writes:
> 
> > On Wednesday, April 04/18/18, 2018 at 11:45:46 +0530, Dave Young wrote:
> >> Hi Rahul,
> >> On 04/17/18 at 01:14pm, Rahul Lakkireddy wrote:
> >> > On production servers running variety of workloads over time, kernel
> >> > panic can happen sporadically after days or even months. It is
> >> > important to collect as much debug logs as possible to root cause
> >> > and fix the problem, that may not be easy to reproduce. Snapshot of
> >> > underlying hardware/firmware state (like register dump, firmware
> >> > logs, adapter memory, etc.), at the time of kernel panic will be very
> >> > helpful while debugging the culprit device driver.
> >> > 
> >> > This series of patches add new generic framework that enable device
> >> > drivers to collect device specific snapshot of the hardware/firmware
> >> > state of the underlying device in the crash recovery kernel. In crash
> >> > recovery kernel, the collected logs are added as elf notes to
> >> > /proc/vmcore, which is copied by user space scripts for post-analysis.
> >> > 
> >> > The sequence of actions done by device drivers to append their device
> >> > specific hardware/firmware logs to /proc/vmcore are as follows:
> >> > 
> >> > 1. During probe (before hardware is initialized), device drivers
> >> > register to the vmcore module (via vmcore_add_device_dump()), with
> >> > callback function, along with buffer size and log name needed for
> >> > firmware/hardware log collection.
> >> 
> >> I assumed the elf notes info should be prepared while kexec_[file_]load
> >> phase. But I did not read the old comment, not sure if it has been discussed
> >> or not.
> >> 
> >
> > We must not collect dumps in crashing kernel. Adding more things in
> > crash dump path risks not collecting vmcore at all. Eric had
> > discussed this in more detail at:
> >
> > https://lkml.org/lkml/2018/3/24/319
> >
> > We are safe to collect dumps in the second kernel. Each device dump
> > will be exported as an elf note in /proc/vmcore.
> 
> It just occurred to me there is one variation that is worth
> considering.
> 
> Is the area you are looking at dumping part of a huge mmio area?
> I think someone said 2GB?
> 
> If that is the case it could be worth it to simply add the needed
> addresses to the range of memory we need to dump, and simply having a
> elf note saying that is what happened.
> 

We are _not_ dumping mmio area. However, one part of the dump
collection involves reading 2 GB on-chip memory via PIO access,
which is compressed and stored.

Thanks,
Rahul

^ permalink raw reply

* Re: [PATCH v1 3/7] soc: mediatek: reuse read[l,x]_poll_timeout helpers
From: Matthias Brugger @ 2018-04-18 15:06 UTC (permalink / raw)
  To: sean.wang, robh+dt, mark.rutland, marcel, johan.hedberg
  Cc: devicetree, linux-bluetooth, linux-arm-kernel, linux-mediatek,
	linux-kernel, Ulf Hansson, Weiyi Lu
In-Reply-To: <2c0d233c658fa6093a18a54e82a3e51340251bc9.1522736996.git.sean.wang@mediatek.com>



On 04/03/2018 09:15 AM, sean.wang@mediatek.com wrote:
> From: Sean Wang <sean.wang@mediatek.com>
> 
> Reuse the common helpers read[l,x]_poll_timeout provided by Linux core
> instead of an open-coded handling. The name of the local variable
> sram_pdn_ack in scpsys_power_on is renamed to pdn_ack in order to be
> consistent with the one used in scpsys_power_off.
> 
> Signed-off-by: Sean Wang <sean.wang@mediatek.com>
> Cc: Matthias Brugger <matthias.bgg@gmail.com>
> Cc: Ulf Hansson <ulf.hansson@linaro.org>
> Cc: Weiyi Lu <weiyi.lu@mediatek.com>
> ---

pushed to v4.17-next/soc

Thanks!

>  drivers/soc/mediatek/mtk-scpsys.c | 91 ++++++++++-----------------------------
>  1 file changed, 23 insertions(+), 68 deletions(-)
> 
> diff --git a/drivers/soc/mediatek/mtk-scpsys.c b/drivers/soc/mediatek/mtk-scpsys.c
> index d762a46..f9b7248 100644
> --- a/drivers/soc/mediatek/mtk-scpsys.c
> +++ b/drivers/soc/mediatek/mtk-scpsys.c
> @@ -13,6 +13,7 @@
>  #include <linux/clk.h>
>  #include <linux/init.h>
>  #include <linux/io.h>
> +#include <linux/iopoll.h>
>  #include <linux/mfd/syscon.h>
>  #include <linux/of_device.h>
>  #include <linux/platform_device.h>
> @@ -27,6 +28,9 @@
>  #include <dt-bindings/power/mt7623a-power.h>
>  #include <dt-bindings/power/mt8173-power.h>
>  
> +#define MTK_POLL_DELAY_US   10
> +#define MTK_POLL_TIMEOUT    (jiffies_to_usecs(HZ))
> +
>  #define SPM_VDE_PWR_CON			0x0210
>  #define SPM_MFG_PWR_CON			0x0214
>  #define SPM_VEN_PWR_CON			0x0230
> @@ -184,12 +188,10 @@ static int scpsys_power_on(struct generic_pm_domain *genpd)
>  {
>  	struct scp_domain *scpd = container_of(genpd, struct scp_domain, genpd);
>  	struct scp *scp = scpd->scp;
> -	unsigned long timeout;
> -	bool expired;
>  	void __iomem *ctl_addr = scp->base + scpd->data->ctl_offs;
> -	u32 sram_pdn_ack = scpd->data->sram_pdn_ack_bits;
> +	u32 pdn_ack = scpd->data->sram_pdn_ack_bits;
>  	u32 val;
> -	int ret;
> +	int ret, tmp;
>  	int i;
>  
>  	if (scpd->supply) {
> @@ -215,23 +217,10 @@ static int scpsys_power_on(struct generic_pm_domain *genpd)
>  	writel(val, ctl_addr);
>  
>  	/* wait until PWR_ACK = 1 */
> -	timeout = jiffies + HZ;
> -	expired = false;
> -	while (1) {
> -		ret = scpsys_domain_is_on(scpd);
> -		if (ret > 0)
> -			break;
> -
> -		if (expired) {
> -			ret = -ETIMEDOUT;
> -			goto err_pwr_ack;
> -		}
> -
> -		cpu_relax();
> -
> -		if (time_after(jiffies, timeout))
> -			expired = true;
> -	}
> +	ret = readx_poll_timeout(scpsys_domain_is_on, scpd, tmp, tmp > 0,
> +				 MTK_POLL_DELAY_US, MTK_POLL_TIMEOUT);
> +	if (ret < 0)
> +		goto err_pwr_ack;
>  
>  	val &= ~PWR_CLK_DIS_BIT;
>  	writel(val, ctl_addr);
> @@ -246,20 +235,10 @@ static int scpsys_power_on(struct generic_pm_domain *genpd)
>  	writel(val, ctl_addr);
>  
>  	/* wait until SRAM_PDN_ACK all 0 */
> -	timeout = jiffies + HZ;
> -	expired = false;
> -	while (sram_pdn_ack && (readl(ctl_addr) & sram_pdn_ack)) {
> -
> -		if (expired) {
> -			ret = -ETIMEDOUT;
> -			goto err_pwr_ack;
> -		}
> -
> -		cpu_relax();
> -
> -		if (time_after(jiffies, timeout))
> -			expired = true;
> -	}
> +	ret = readl_poll_timeout(ctl_addr, tmp, (tmp & pdn_ack) == 0,
> +				 MTK_POLL_DELAY_US, MTK_POLL_TIMEOUT);
> +	if (ret < 0)
> +		goto err_pwr_ack;
>  
>  	if (scpd->data->bus_prot_mask) {
>  		ret = mtk_infracfg_clear_bus_protection(scp->infracfg,
> @@ -289,12 +268,10 @@ static int scpsys_power_off(struct generic_pm_domain *genpd)
>  {
>  	struct scp_domain *scpd = container_of(genpd, struct scp_domain, genpd);
>  	struct scp *scp = scpd->scp;
> -	unsigned long timeout;
> -	bool expired;
>  	void __iomem *ctl_addr = scp->base + scpd->data->ctl_offs;
>  	u32 pdn_ack = scpd->data->sram_pdn_ack_bits;
>  	u32 val;
> -	int ret;
> +	int ret, tmp;
>  	int i;
>  
>  	if (scpd->data->bus_prot_mask) {
> @@ -310,19 +287,10 @@ static int scpsys_power_off(struct generic_pm_domain *genpd)
>  	writel(val, ctl_addr);
>  
>  	/* wait until SRAM_PDN_ACK all 1 */
> -	timeout = jiffies + HZ;
> -	expired = false;
> -	while (pdn_ack && (readl(ctl_addr) & pdn_ack) != pdn_ack) {
> -		if (expired) {
> -			ret = -ETIMEDOUT;
> -			goto out;
> -		}
> -
> -		cpu_relax();
> -
> -		if (time_after(jiffies, timeout))
> -			expired = true;
> -	}
> +	ret = readl_poll_timeout(ctl_addr, tmp, (tmp & pdn_ack) == pdn_ack,
> +				 MTK_POLL_DELAY_US, MTK_POLL_TIMEOUT);
> +	if (ret < 0)
> +		goto out;
>  
>  	val |= PWR_ISO_BIT;
>  	writel(val, ctl_addr);
> @@ -340,23 +308,10 @@ static int scpsys_power_off(struct generic_pm_domain *genpd)
>  	writel(val, ctl_addr);
>  
>  	/* wait until PWR_ACK = 0 */
> -	timeout = jiffies + HZ;
> -	expired = false;
> -	while (1) {
> -		ret = scpsys_domain_is_on(scpd);
> -		if (ret == 0)
> -			break;
> -
> -		if (expired) {
> -			ret = -ETIMEDOUT;
> -			goto out;
> -		}
> -
> -		cpu_relax();
> -
> -		if (time_after(jiffies, timeout))
> -			expired = true;
> -	}
> +	ret = readx_poll_timeout(scpsys_domain_is_on, scpd, tmp, tmp == 0,
> +				 MTK_POLL_DELAY_US, MTK_POLL_TIMEOUT);
> +	if (ret < 0)
> +		goto out;
>  
>  	for (i = 0; i < MAX_CLKS && scpd->clk[i]; i++)
>  		clk_disable_unprepare(scpd->clk[i]);
> 

^ permalink raw reply

* Re: [PATCH] nvme: fix the suspicious RCU usage warning in nvme_mpath_clear_current_path
From: Christoph Hellwig @ 2018-04-18 15:07 UTC (permalink / raw)
  To: Keith Busch; +Cc: Jianchao Wang, axboe, hch, sagi, linux-nvme, linux-kernel
In-Reply-To: <20180418144524.GL11513@localhost.localdomain>

On Wed, Apr 18, 2018 at 08:45:25AM -0600, Keith Busch wrote:
> Nothing against this patch. This just doesn't look correct even from
> before since nvme_find_path can set head->current_path right back to
> this namespace that we're trying to clear.
> 
> Christoph, am I missing something here or does this need additional
> checks/synchronization?

Yes, we should probably call it after removing the namespace from
the ns_head list, instead of right before.

^ permalink raw reply

* [PATCH] [media] include/media: fix missing | operator when setting cfg
From: Colin King @ 2018-04-18 15:06 UTC (permalink / raw)
  To: Kyungmin Park, Sylwester Nawrocki, Mauro Carvalho Chehab,
	Kukjin Kim, Krzysztof Kozlowski, linux-media, linux-arm-kernel,
	linux-samsung-soc
  Cc: kernel-janitors, linux-kernel

From: Colin Ian King <colin.king@canonical.com>

The value from a readl is being masked with ITE_REG_CIOCAN_MASK however
this is not being used and cfg is being re-assigned.  I believe the
assignment operator should actually be instead the |= operator.

Detected by CoverityScan, CID#1467987 ("Unused value")

Signed-off-by: Colin Ian King <colin.king@canonical.com>
---
 drivers/media/platform/exynos4-is/fimc-lite-reg.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/media/platform/exynos4-is/fimc-lite-reg.c b/drivers/media/platform/exynos4-is/fimc-lite-reg.c
index f0acc550d065..16565a0b4bf1 100644
--- a/drivers/media/platform/exynos4-is/fimc-lite-reg.c
+++ b/drivers/media/platform/exynos4-is/fimc-lite-reg.c
@@ -254,7 +254,7 @@ void flite_hw_set_dma_window(struct fimc_lite *dev, struct flite_frame *f)
 	/* Maximum output pixel size */
 	cfg = readl(dev->regs + FLITE_REG_CIOCAN);
 	cfg &= ~FLITE_REG_CIOCAN_MASK;
-	cfg = (f->f_height << 16) | f->f_width;
+	cfg |= (f->f_height << 16) | f->f_width;
 	writel(cfg, dev->regs + FLITE_REG_CIOCAN);
 
 	/* DMA offsets */
-- 
2.17.0

^ permalink raw reply related

* Re: [PATCH 4.16 55/68] apparmor: fix display of .ns_name for containers
From: Serge E. Hallyn @ 2018-04-18 15:05 UTC (permalink / raw)
  To: Greg Kroah-Hartman; +Cc: linux-kernel, stable, Serge Hallyn, John Johansen
In-Reply-To: <20180417155751.583001796@linuxfoundation.org>

Quoting Greg Kroah-Hartman (gregkh@linuxfoundation.org):
> 4.16-stable review patch.  If anyone has any objections, please let me know.
> 
> ------------------
> 
> From: John Johansen <john.johansen@canonical.com>
> 
> commit 040d9e2bce0a5b321c402b79ee43a8e8d2fd3b06 upstream.
> 
> The .ns_name should not be virtualized by the current ns view. It
> needs to report the ns base name as that is being used during startup
> as part of determining apparmor policy namespace support.
> 
> BugLink: http://bugs.launchpad.net/bugs/1746463
> Fixes: d9f02d9c237aa ("apparmor: fix display of ns name")
> Cc: Stable <stable@vger.kernel.org>
> Reported-by: Serge Hallyn <serge@hallyn.com>

Excellent, thank you - this has been a pretty invasive bug for
nested container usage.

> Tested-by: Serge Hallyn <serge@hallyn.com>
> Signed-off-by: John Johansen <john.johansen@canonical.com>
> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> 
> ---
>  security/apparmor/apparmorfs.c |    4 +---
>  1 file changed, 1 insertion(+), 3 deletions(-)
> 
> --- a/security/apparmor/apparmorfs.c
> +++ b/security/apparmor/apparmorfs.c
> @@ -1189,9 +1189,7 @@ static int seq_ns_level_show(struct seq_
>  static int seq_ns_name_show(struct seq_file *seq, void *v)
>  {
>  	struct aa_label *label = begin_current_label_crit_section();
> -
> -	seq_printf(seq, "%s\n", aa_ns_name(labels_ns(label),
> -					   labels_ns(label), true));
> +	seq_printf(seq, "%s\n", labels_ns(label)->base.name);
>  	end_current_label_crit_section(label);
>  
>  	return 0;
> 

^ permalink raw reply

* Re: [PATCH] SLUB: Do not fallback to mininum order if __GFP_NORETRY is set
From: Mikulas Patocka @ 2018-04-18 15:05 UTC (permalink / raw)
  To: Christopher Lameter
  Cc: Vlastimil Babka, Mike Snitzer, Matthew Wilcox, Pekka Enberg,
	linux-mm, dm-devel, David Rientjes, Joonsoo Kim, Andrew Morton,
	linux-kernel
In-Reply-To: <alpine.DEB.2.20.1804180944180.1062@nuc-kabylake>



On Wed, 18 Apr 2018, Christopher Lameter wrote:

> Mikulas Patoka wants to ensure that no fallback to lower order happens. I
> think __GFP_NORETRY should work correctly in that case too and not fall
> back.
> 
> 
> 
> Allocating at a smaller order is a retry operation and should not
> be attempted.
> 
> If the caller does not want retries then respect that.
> 
> GFP_NORETRY allows callers to ensure that only maximum order
> allocations are attempted.
> 
> Signed-off-by: Christoph Lameter <cl@linux.com>
> 
> Index: linux/mm/slub.c
> ===================================================================
> --- linux.orig/mm/slub.c
> +++ linux/mm/slub.c
> @@ -1598,7 +1598,7 @@ static struct page *allocate_slab(struct
>  		alloc_gfp = (alloc_gfp | __GFP_NOMEMALLOC) & ~(__GFP_RECLAIM|__GFP_NOFAIL);
> 
>  	page = alloc_slab_page(s, alloc_gfp, node, oo);
> -	if (unlikely(!page)) {
> +	if (unlikely(!page) && !(flags & __GFP_NORETRY)) {
>  		oo = s->min;
>  		alloc_gfp = flags;
>  		/*

No, this would hit NULL pointer dereference if page is NULL and 
__GFP_NORETRY is set. You want this:

---
 mm/slub.c |    2 ++
 1 file changed, 2 insertions(+)

Index: linux-2.6/mm/slub.c
===================================================================
--- linux-2.6.orig/mm/slub.c	2018-04-17 20:58:23.000000000 +0200
+++ linux-2.6/mm/slub.c	2018-04-18 17:04:01.000000000 +0200
@@ -1599,6 +1599,8 @@ static struct page *allocate_slab(struct
 
 	page = alloc_slab_page(s, alloc_gfp, node, oo);
 	if (unlikely(!page)) {
+		if (flags & __GFP_NORETRY)
+			goto out;
 		oo = s->min;
 		alloc_gfp = flags;
 		/*

^ permalink raw reply

* Re: perf probe line numbers + CONFIG_DEBUG_INFO_SPLIT=y
From: Arnaldo Carvalho de Melo @ 2018-04-18 15:05 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: Jiri Olsa, Namhyung Kim, Linux Kernel Mailing List, systemtap,
	Mark Wielaard
In-Reply-To: <20180418230301.fd26676ece5b2dc16d98b266@kernel.org>

Em Wed, Apr 18, 2018 at 11:03:01PM +0900, Masami Hiramatsu escreveu:
> And I found below description in systemtap document(man/error::dwarf.7stap).
> ===
> debuginfo configuration
> Some tools may generate debuginfo that is unsupported by systemtap, such
> as the linux kernel CONFIG_DEBUG_INFO_SPLIT (\f2.dwo\f1 files) option.
> Stick with plain ELF/DWARF (optinally split, Fedora-style), if possible.
> ===
 
> So, it seems that elfutils may not support this split debuginfo yet.

Ok, what about detecting that this is the case: .dwo is being used, as
detected by the presence of those .debug_*.dwo ELF sections and then
warning the user that this mode of operation is not supported yet?

- Arnaldo

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox