linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/7] Various fixes for nested sleep stuff
@ 2014-10-31 11:10 Peter Zijlstra
  2014-10-31 11:10 ` [PATCH 1/7] sched,wait: Fix a kthread race with wait_woken() Peter Zijlstra
                   ` (6 more replies)
  0 siblings, 7 replies; 15+ messages in thread
From: Peter Zijlstra @ 2014-10-31 11:10 UTC (permalink / raw)
  To: mingo, linux-kernel
  Cc: peter, oleg, rjw, torvalds, tglx, eparis, umgwanakikbuti, marcel,
	ebiederm, davem, fengguang.wu, Peter Zijlstra

Here's a few patches fixing reported nested sleep warns and issues.



^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 1/7] sched,wait: Fix a kthread race with wait_woken()
  2014-10-31 11:10 [PATCH 0/7] Various fixes for nested sleep stuff Peter Zijlstra
@ 2014-10-31 11:10 ` Peter Zijlstra
  2014-10-31 22:13   ` Oleg Nesterov
  2014-10-31 11:10 ` [PATCH 2/7] wait: Reimplement wait_event_freezable() Peter Zijlstra
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 15+ messages in thread
From: Peter Zijlstra @ 2014-10-31 11:10 UTC (permalink / raw)
  To: mingo, linux-kernel
  Cc: peter, oleg, rjw, torvalds, tglx, eparis, umgwanakikbuti, marcel,
	ebiederm, davem, fengguang.wu, Peter Zijlstra

[-- Attachment #1: peterz-wait_woken-kthread.patch --]
[-- Type: text/plain, Size: 2015 bytes --]

There is a race between kthread_stop() and the new wait_woken() that
can result in a lack of progress.

CPU 0                                    | CPU 1
                                         |
rfcomm_run()                             | kthread_stop()
  ...                                    |
  if (!test_bit(KTHREAD_SHOULD_STOP))    |
                                         |   set_bit(KTHREAD_SHOULD_STOP)
                                         |   wake_up_process()
    wait_woken()                         |   wait_for_completion()
      set_current_state(INTERRUPTIBLE)   |
      if (!WQ_FLAG_WOKEN)                |
        schedule_timeout()               |
                                         |

After which both tasks will wait.. forever.

Fix this by having wait_woken() check for kthread_should_stop() but
only for kthreads (obviously).

Cc: Peter Hurley <peter@hurleysoftware.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 kernel/sched/wait.c |    7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

--- a/kernel/sched/wait.c
+++ b/kernel/sched/wait.c
@@ -9,6 +9,7 @@
 #include <linux/mm.h>
 #include <linux/wait.h>
 #include <linux/hash.h>
+#include <linux/kthread.h>
 
 void __init_waitqueue_head(wait_queue_head_t *q, const char *name, struct lock_class_key *key)
 {
@@ -297,6 +298,10 @@ int autoremove_wake_function(wait_queue_
 }
 EXPORT_SYMBOL(autoremove_wake_function);
 
+static inline bool is_kthread_should_stop(void)
+{
+	return (current->flags & PF_KTHREAD) && kthread_should_stop();
+}
 
 /*
  * DEFINE_WAIT_FUNC(wait, woken_wake_func);
@@ -326,7 +331,7 @@ long wait_woken(wait_queue_t *wait, unsi
 	 * woken_wake_function() such that if we observe WQ_FLAG_WOKEN we must
 	 * also observe all state before the wakeup.
 	 */
-	if (!(wait->flags & WQ_FLAG_WOKEN))
+	if (!(wait->flags & WQ_FLAG_WOKEN) && !is_kthread_should_stop())
 		timeout = schedule_timeout(timeout);
 	__set_current_state(TASK_RUNNING);
 



^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 2/7] wait: Reimplement wait_event_freezable()
  2014-10-31 11:10 [PATCH 0/7] Various fixes for nested sleep stuff Peter Zijlstra
  2014-10-31 11:10 ` [PATCH 1/7] sched,wait: Fix a kthread race with wait_woken() Peter Zijlstra
@ 2014-10-31 11:10 ` Peter Zijlstra
  2014-10-31 17:41   ` Peter Zijlstra
  2014-10-31 21:38   ` Oleg Nesterov
  2014-10-31 11:10 ` [PATCH 3/7] wait: Remove wait_event_freezekillable() Peter Zijlstra
                   ` (4 subsequent siblings)
  6 siblings, 2 replies; 15+ messages in thread
From: Peter Zijlstra @ 2014-10-31 11:10 UTC (permalink / raw)
  To: mingo, linux-kernel
  Cc: peter, oleg, rjw, torvalds, tglx, eparis, umgwanakikbuti, marcel,
	ebiederm, davem, fengguang.wu, Peter Zijlstra

[-- Attachment #1: peterz-wait-freezable.patch --]
[-- Type: text/plain, Size: 4763 bytes --]

Provide better implementations of wait_event_freezable() APIs.

The problem is with freezer_do_not_count(), it hides the thread from
the freezer, even though this thread might not actually freeze/sleep
at all.

Cc: oleg@redhat.com
Cc: Rafael Wysocki <rjw@rjwysocki.net>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 include/linux/freezer.h |   38 --------------------------------
 include/linux/wait.h    |   56 ++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 56 insertions(+), 38 deletions(-)

--- a/include/linux/freezer.h
+++ b/include/linux/freezer.h
@@ -265,35 +265,6 @@ static inline int freezable_schedule_hrt
 	__retval;							\
 })
 
-#define wait_event_freezable(wq, condition)				\
-({									\
-	int __retval;							\
-	freezer_do_not_count();						\
-	__retval = wait_event_interruptible(wq, (condition));		\
-	freezer_count();						\
-	__retval;							\
-})
-
-#define wait_event_freezable_timeout(wq, condition, timeout)		\
-({									\
-	long __retval = timeout;					\
-	freezer_do_not_count();						\
-	__retval = wait_event_interruptible_timeout(wq,	(condition),	\
-				__retval);				\
-	freezer_count();						\
-	__retval;							\
-})
-
-#define wait_event_freezable_exclusive(wq, condition)			\
-({									\
-	int __retval;							\
-	freezer_do_not_count();						\
-	__retval = wait_event_interruptible_exclusive(wq, condition);	\
-	freezer_count();						\
-	__retval;							\
-})
-
-
 #else /* !CONFIG_FREEZER */
 static inline bool frozen(struct task_struct *p) { return false; }
 static inline bool freezing(struct task_struct *p) { return false; }
@@ -331,15 +302,6 @@ static inline void set_freezable(void) {
 #define freezable_schedule_hrtimeout_range(expires, delta, mode)	\
 	schedule_hrtimeout_range(expires, delta, mode)
 
-#define wait_event_freezable(wq, condition)				\
-		wait_event_interruptible(wq, condition)
-
-#define wait_event_freezable_timeout(wq, condition, timeout)		\
-		wait_event_interruptible_timeout(wq, condition, timeout)
-
-#define wait_event_freezable_exclusive(wq, condition)			\
-		wait_event_interruptible_exclusive(wq, condition)
-
 #define wait_event_freezekillable(wq, condition)		\
 		wait_event_killable(wq, condition)
 
--- a/include/linux/wait.h
+++ b/include/linux/wait.h
@@ -267,6 +267,30 @@ do {									\
 	__wait_event(wq, condition);					\
 } while (0)
 
+#define __wait_event_freezable(wq, condition)				\
+	(void)___wait_event(wq, condition, TASK_INTERRUPTIBLE, 0, 0,	\
+			    schedule(); try_to_freeze())
+
+/**
+ * wait_event - sleep (or freeze) until a condition gets true
+ * @wq: the waitqueue to wait on
+ * @condition: a C expression for the event to wait for
+ *
+ * The process is put to sleep (TASK_INTERRUPTIBLE -- so as not to contribute
+ * to system load) until the @condition evaluates to true. The
+ * @condition is checked each time the waitqueue @wq is woken up.
+ *
+ * wake_up() has to be called after changing any variable that could
+ * change the result of the wait condition.
+ */
+#define wait_event_freezable(wq, condition)				\
+do {									\
+	might_sleep();							\
+	if (condition)							\
+		break;							\
+	__wait_event_freezable(wq, condition);				\
+} while (0)
+
 #define __wait_event_timeout(wq, condition, timeout)			\
 	___wait_event(wq, ___wait_cond_timeout(condition),		\
 		      TASK_UNINTERRUPTIBLE, 0, timeout,			\
@@ -300,6 +324,24 @@ do {									\
 	__ret;								\
 })
 
+#define __wait_event_freezable_timeout(wq, condition, timeout)		\
+	___wait_event(wq, ___wait_cond_timeout(condition),		\
+		      TASK_INTERRUPTIBLE, 0, timeout,			\
+		      __ret = schedule_timeout(__ret); try_to_freeze())
+
+/*
+ * like wait_event_timeout() -- except it uses TASK_INTERRUPTIBLE to avoid
+ * increasing load and is freezable.
+ */
+#define wait_event_freezable_timeout(wq, condition, timeout)		\
+({									\
+	long __ret = timeout;						\
+	might_sleep();							\
+	if (!___wait_cond_timeout(condition))				\
+		__ret = __wait_event_freezable_timeout(wq, condition, timeout);	\
+	__ret;								\
+})
+
 #define __wait_event_cmd(wq, condition, cmd1, cmd2)			\
 	(void)___wait_event(wq, condition, TASK_UNINTERRUPTIBLE, 0, 0,	\
 			    cmd1; schedule(); cmd2)
@@ -479,6 +521,20 @@ do {									\
 	__ret;								\
 })
 
+
+#define __wait_event_freezable_exclusive(wq, condition)			\
+	___wait_event(wq, condition, TASK_INTERRUPTIBLE, 1, 0,		\
+			schedule(); try_to_freeze())
+
+#define wait_event_freezable_exclusive(wq, condition)			\
+({									\
+	int __ret = 0;							\
+	might_sleep();							\
+	if (!(condition))						\
+		__ret = __wait_event_freezable_exclusive(wq, condition);\
+	__ret;								\
+})
+
 
 #define __wait_event_interruptible_locked(wq, condition, exclusive, irq) \
 ({									\



^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 3/7] wait: Remove wait_event_freezekillable()
  2014-10-31 11:10 [PATCH 0/7] Various fixes for nested sleep stuff Peter Zijlstra
  2014-10-31 11:10 ` [PATCH 1/7] sched,wait: Fix a kthread race with wait_woken() Peter Zijlstra
  2014-10-31 11:10 ` [PATCH 2/7] wait: Reimplement wait_event_freezable() Peter Zijlstra
@ 2014-10-31 11:10 ` Peter Zijlstra
  2014-10-31 11:10 ` [PATCH 4/7] audit,wait: Fixup kauditd_thread wait loop Peter Zijlstra
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 15+ messages in thread
From: Peter Zijlstra @ 2014-10-31 11:10 UTC (permalink / raw)
  To: mingo, linux-kernel
  Cc: peter, oleg, rjw, torvalds, tglx, eparis, umgwanakikbuti, marcel,
	ebiederm, davem, fengguang.wu, Peter Zijlstra

[-- Attachment #1: peterz-wait-freezable-2.patch --]
[-- Type: text/plain, Size: 1132 bytes --]

There is no user.. make it go away.

Cc: oleg@redhat.com
Cc: Rafael Wysocki <rjw@rjwysocki.net>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 include/linux/freezer.h |   12 ------------
 1 file changed, 12 deletions(-)

--- a/include/linux/freezer.h
+++ b/include/linux/freezer.h
@@ -246,15 +246,6 @@ static inline int freezable_schedule_hrt
  * defined in <linux/wait.h>
  */
 
-#define wait_event_freezekillable(wq, condition)			\
-({									\
-	int __retval;							\
-	freezer_do_not_count();						\
-	__retval = wait_event_killable(wq, (condition));		\
-	freezer_count();						\
-	__retval;							\
-})
-
 /* DO NOT ADD ANY NEW CALLERS OF THIS FUNCTION */
 #define wait_event_freezekillable_unsafe(wq, condition)			\
 ({									\
@@ -302,9 +293,6 @@ static inline void set_freezable(void) {
 #define freezable_schedule_hrtimeout_range(expires, delta, mode)	\
 	schedule_hrtimeout_range(expires, delta, mode)
 
-#define wait_event_freezekillable(wq, condition)		\
-		wait_event_killable(wq, condition)
-
 #define wait_event_freezekillable_unsafe(wq, condition)			\
 		wait_event_killable(wq, condition)
 



^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 4/7] audit,wait: Fixup kauditd_thread wait loop
  2014-10-31 11:10 [PATCH 0/7] Various fixes for nested sleep stuff Peter Zijlstra
                   ` (2 preceding siblings ...)
  2014-10-31 11:10 ` [PATCH 3/7] wait: Remove wait_event_freezekillable() Peter Zijlstra
@ 2014-10-31 11:10 ` Peter Zijlstra
  2014-10-31 11:10 ` [PATCH 5/7] rfcomm: Fix broken wait construct Peter Zijlstra
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 15+ messages in thread
From: Peter Zijlstra @ 2014-10-31 11:10 UTC (permalink / raw)
  To: mingo, linux-kernel
  Cc: peter, oleg, rjw, torvalds, tglx, eparis, umgwanakikbuti, marcel,
	ebiederm, davem, fengguang.wu, Peter Zijlstra

[-- Attachment #1: peter_zijlstra-nested_sleeps_fixes_and_debug_infrastructure.patch --]
[-- Type: text/plain, Size: 1522 bytes --]

The kauditd_thread wait loop is a bit iffy; it has a number of problems:

 - calls try_to_freeze() before schedule(); you typically want the
   thread to re-evaluate the sleep condition when unfreezing, also
   freeze_task() issues a wakeup.

 - it unconditionally does the {add,remove}_wait_queue(), even when the
   sleep condition is false.

Use wait_event_freezable() that does the right thing.

Cc: mingo@kernel.org
Cc: oleg@redhat.com
Cc: torvalds@linux-foundation.org
Cc: tglx@linutronix.de
Cc: Eric Paris <eparis@redhat.com>
Reported-by: Mike Galbraith <umgwanakikbuti@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20141002102251.GA6324@worktop.programming.kicks-ass.net
---
 kernel/audit.c |   11 +----------
 1 file changed, 1 insertion(+), 10 deletions(-)

--- a/kernel/audit.c
+++ b/kernel/audit.c
@@ -499,7 +499,6 @@ static int kauditd_thread(void *dummy)
 	set_freezable();
 	while (!kthread_should_stop()) {
 		struct sk_buff *skb;
-		DECLARE_WAITQUEUE(wait, current);
 
 		flush_hold_queue();
 
@@ -514,16 +513,8 @@ static int kauditd_thread(void *dummy)
 				audit_printk_skb(skb);
 			continue;
 		}
-		set_current_state(TASK_INTERRUPTIBLE);
-		add_wait_queue(&kauditd_wait, &wait);
 
-		if (!skb_queue_len(&audit_skb_queue)) {
-			try_to_freeze();
-			schedule();
-		}
-
-		__set_current_state(TASK_RUNNING);
-		remove_wait_queue(&kauditd_wait, &wait);
+		wait_event_freezable(kauditd_wait, skb_queue_len(&audit_skb_queue));
 	}
 	return 0;
 }



^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 5/7] rfcomm: Fix broken wait construct
  2014-10-31 11:10 [PATCH 0/7] Various fixes for nested sleep stuff Peter Zijlstra
                   ` (3 preceding siblings ...)
  2014-10-31 11:10 ` [PATCH 4/7] audit,wait: Fixup kauditd_thread wait loop Peter Zijlstra
@ 2014-10-31 11:10 ` Peter Zijlstra
  2014-10-31 11:10 ` [PATCH 6/7] netdev: Fix sleeping inside wait event Peter Zijlstra
  2014-10-31 11:10 ` [PATCH 7/7] sched: Use WARN_ONCE for the might_sleep() TASK_RUNNING test Peter Zijlstra
  6 siblings, 0 replies; 15+ messages in thread
From: Peter Zijlstra @ 2014-10-31 11:10 UTC (permalink / raw)
  To: mingo, linux-kernel
  Cc: peter, oleg, rjw, torvalds, tglx, eparis, umgwanakikbuti, marcel,
	ebiederm, davem, fengguang.wu, Peter Zijlstra

[-- Attachment #1: peterz-fix-rfcomm.patch --]
[-- Type: text/plain, Size: 1539 bytes --]

rfcomm_run() is a tad broken in that is has a nested wait loop. One
cannot rely on p->state for the outer wait because the inner wait will
overwrite it.

Cc: Marcel Holtmann <marcel@holtmann.org>
Cc: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 net/bluetooth/rfcomm/core.c |   18 ++++++++----------
 1 file changed, 8 insertions(+), 10 deletions(-)

--- a/net/bluetooth/rfcomm/core.c
+++ b/net/bluetooth/rfcomm/core.c
@@ -101,11 +101,11 @@ static struct rfcomm_session *rfcomm_ses
 #define __get_rpn_stop_bits(line) (((line) >> 2) & 0x1)
 #define __get_rpn_parity(line)    (((line) >> 3) & 0x7)
 
+static DECLARE_WAIT_QUEUE_HEAD(rfcomm_wq);
+
 static void rfcomm_schedule(void)
 {
-	if (!rfcomm_thread)
-		return;
-	wake_up_process(rfcomm_thread);
+	wake_up_all(&rfcomm_wq);
 }
 
 /* ---- RFCOMM FCS computation ---- */
@@ -2086,24 +2086,22 @@ static void rfcomm_kill_listener(void)
 
 static int rfcomm_run(void *unused)
 {
+	DEFINE_WAIT_FUNC(wait, woken_wake_function);
 	BT_DBG("");
 
 	set_user_nice(current, -10);
 
 	rfcomm_add_listener(BDADDR_ANY);
 
-	while (1) {
-		set_current_state(TASK_INTERRUPTIBLE);
-
-		if (kthread_should_stop())
-			break;
+	add_wait_queue(&rfcomm_wq, &wait);
+	while (!kthread_should_stop()) {
 
 		/* Process stuff */
 		rfcomm_process_sessions();
 
-		schedule();
+		wait_woken(&wait, TASK_INTERRUPTIBLE, MAX_SCHEDULE_TIMEOUT);
 	}
-	__set_current_state(TASK_RUNNING);
+	remove_wait_queue(&rfcomm_wq, &wait);
 
 	rfcomm_kill_listener();
 



^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 6/7] netdev: Fix sleeping inside wait event
  2014-10-31 11:10 [PATCH 0/7] Various fixes for nested sleep stuff Peter Zijlstra
                   ` (4 preceding siblings ...)
  2014-10-31 11:10 ` [PATCH 5/7] rfcomm: Fix broken wait construct Peter Zijlstra
@ 2014-10-31 11:10 ` Peter Zijlstra
  2014-10-31 11:10 ` [PATCH 7/7] sched: Use WARN_ONCE for the might_sleep() TASK_RUNNING test Peter Zijlstra
  6 siblings, 0 replies; 15+ messages in thread
From: Peter Zijlstra @ 2014-10-31 11:10 UTC (permalink / raw)
  To: mingo, linux-kernel
  Cc: peter, oleg, rjw, torvalds, tglx, eparis, umgwanakikbuti, marcel,
	ebiederm, davem, fengguang.wu, Peter Zijlstra

[-- Attachment #1: peterz-fix-netdev-device_exit_batch.patch --]
[-- Type: text/plain, Size: 2333 bytes --]

rtnl_lock_unregistering*() take rtnl_lock() -- a mutex -- inside a
wait loop. The wait loop relies on current->state to function, but so
does mutex_lock(), nesting them makes for the inner to destroy the
outer state.

Fix this using the new wait_woken() bits.

Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Eric Biederman <ebiederm@xmission.com>
Acked-by: David S. Miller <davem@davemloft.net>
Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20141029173110.GE15602@worktop.programming.kicks-ass.net
---
 net/core/dev.c       |   10 +++++-----
 net/core/rtnetlink.c |   10 +++++-----
 2 files changed, 10 insertions(+), 10 deletions(-)

--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -7196,11 +7196,10 @@ static void __net_exit rtnl_lock_unregis
 	 */
 	struct net *net;
 	bool unregistering;
-	DEFINE_WAIT(wait);
+	DEFINE_WAIT_FUNC(wait, woken_wake_function);
 
+	add_wait_queue(&netdev_unregistering_wq, &wait);
 	for (;;) {
-		prepare_to_wait(&netdev_unregistering_wq, &wait,
-				TASK_UNINTERRUPTIBLE);
 		unregistering = false;
 		rtnl_lock();
 		list_for_each_entry(net, net_list, exit_list) {
@@ -7212,9 +7211,10 @@ static void __net_exit rtnl_lock_unregis
 		if (!unregistering)
 			break;
 		__rtnl_unlock();
-		schedule();
+
+		wait_woken(&wait, TASK_UNINTERRUPTIBLE, MAX_SCHEDULE_TIMEOUT);
 	}
-	finish_wait(&netdev_unregistering_wq, &wait);
+	remove_wait_queue(&netdev_unregistering_wq, &wait);
 }
 
 static void __net_exit default_device_exit_batch(struct list_head *net_list)
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -365,11 +365,10 @@ static void rtnl_lock_unregistering_all(
 {
 	struct net *net;
 	bool unregistering;
-	DEFINE_WAIT(wait);
+	DEFINE_WAIT_FUNC(wait, woken_wake_function);
 
+	add_wait_queue(&netdev_unregistering_wq, &wait);
 	for (;;) {
-		prepare_to_wait(&netdev_unregistering_wq, &wait,
-				TASK_UNINTERRUPTIBLE);
 		unregistering = false;
 		rtnl_lock();
 		for_each_net(net) {
@@ -381,9 +380,10 @@ static void rtnl_lock_unregistering_all(
 		if (!unregistering)
 			break;
 		__rtnl_unlock();
-		schedule();
+
+		wait_woken(&wait, TASK_UNINTERRUPTIBLE, MAX_SCHEDULE_TIMEOUT);
 	}
-	finish_wait(&netdev_unregistering_wq, &wait);
+	remove_wait_queue(&netdev_unregistering_wq, &wait);
 }
 
 /**



^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 7/7] sched: Use WARN_ONCE for the might_sleep() TASK_RUNNING test
  2014-10-31 11:10 [PATCH 0/7] Various fixes for nested sleep stuff Peter Zijlstra
                   ` (5 preceding siblings ...)
  2014-10-31 11:10 ` [PATCH 6/7] netdev: Fix sleeping inside wait event Peter Zijlstra
@ 2014-10-31 11:10 ` Peter Zijlstra
  2014-10-31 22:42   ` Oleg Nesterov
  6 siblings, 1 reply; 15+ messages in thread
From: Peter Zijlstra @ 2014-10-31 11:10 UTC (permalink / raw)
  To: mingo, linux-kernel
  Cc: peter, oleg, rjw, torvalds, tglx, eparis, umgwanakikbuti, marcel,
	ebiederm, davem, fengguang.wu, Peter Zijlstra

[-- Attachment #1: peterz-wait-warn.patch --]
[-- Type: text/plain, Size: 676 bytes --]

In some cases this can trigger a true flood of output.

Requested-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 kernel/sched/core.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -7301,7 +7301,7 @@ void __might_sleep(const char *file, int
 	 * since we will exit with TASK_RUNNING make sure we enter with it,
 	 * otherwise we will destroy state.
 	 */
-	if (WARN(current->state != TASK_RUNNING,
+	if (WARN_ONCE(current->state != TASK_RUNNING,
 			"do not call blocking ops when !TASK_RUNNING; "
 			"state=%lx set at [<%p>] %pS\n",
 			current->state,



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/7] wait: Reimplement wait_event_freezable()
  2014-10-31 11:10 ` [PATCH 2/7] wait: Reimplement wait_event_freezable() Peter Zijlstra
@ 2014-10-31 17:41   ` Peter Zijlstra
  2014-10-31 21:38   ` Oleg Nesterov
  1 sibling, 0 replies; 15+ messages in thread
From: Peter Zijlstra @ 2014-10-31 17:41 UTC (permalink / raw)
  To: mingo, linux-kernel
  Cc: peter, oleg, rjw, torvalds, tglx, eparis, umgwanakikbuti, marcel,
	ebiederm, davem, fengguang.wu

On Fri, Oct 31, 2014 at 12:10:39PM +0100, Peter Zijlstra wrote:
> Provide better implementations of wait_event_freezable() APIs.
> 
> The problem is with freezer_do_not_count(), it hides the thread from
> the freezer, even though this thread might not actually freeze/sleep
> at all.
> 
> Cc: oleg@redhat.com
> Cc: Rafael Wysocki <rjw@rjwysocki.net>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> ---

This would maybe compile better.. gotta run, will test later.

--- a/include/linux/wait.h
+++ b/include/linux/wait.h
@@ -284,12 +284,13 @@ do {									\
  * change the result of the wait condition.
  */
 #define wait_event_freezable(wq, condition)				\
-do {									\
+({									\
+	int __ret = 0;							\
 	might_sleep();							\
-	if (condition)							\
-		break;							\
-	__wait_event_freezable(wq, condition);				\
-} while (0)
+	if (!(condition))						\
+		__ret = __wait_event_freezable(wq, condition);		\
+	__ret;								\
+})
 
 #define __wait_event_timeout(wq, condition, timeout)			\
 	___wait_event(wq, ___wait_cond_timeout(condition),		\

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/7] wait: Reimplement wait_event_freezable()
  2014-10-31 11:10 ` [PATCH 2/7] wait: Reimplement wait_event_freezable() Peter Zijlstra
  2014-10-31 17:41   ` Peter Zijlstra
@ 2014-10-31 21:38   ` Oleg Nesterov
  2014-10-31 21:48     ` Peter Zijlstra
  2014-10-31 21:53     ` Oleg Nesterov
  1 sibling, 2 replies; 15+ messages in thread
From: Oleg Nesterov @ 2014-10-31 21:38 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: mingo, linux-kernel, peter, rjw, torvalds, tglx, eparis,
	umgwanakikbuti, marcel, ebiederm, davem, fengguang.wu

On 10/31, Peter Zijlstra wrote:
>
> Provide better implementations of wait_event_freezable() APIs.
>
> The problem is with freezer_do_not_count(), it hides the thread from
> the freezer, even though this thread might not actually freeze/sleep
> at all.

I agree, wait_event_freezable() is awful. But could you clarify "at all" ?

Sure, the task can be preempted right after it sets, it can do a lot
of things before it calls schedule(), it can be woken after that and
it can run again and do something else before freezer_count() calls
try_to_freeze(), etc.

Is this what you meant?


> +#define __wait_event_freezable(wq, condition)				\
> +	(void)___wait_event(wq, condition, TASK_INTERRUPTIBLE, 0, 0,	\
> +			    schedule(); try_to_freeze())

I don't think this can work. wait_event_freezable() should be used by
kernel threads and thus we can't rely on TIF_SIGPENDING, freeze_task()
simply does wake_up_state(TASK_INTERRUPTIBLE) in this case.

Just for example, suppose that try_to_freeze_tasks() calls freeze_task()
before this kthread sets current->state = TASK_INTERRUPTIBLE. In this
case __wait_event_freezable()->schedule() will happily sleep and
try_to_freeze_tasks() will fail.

That is why I tried to suggest cmd == freezable_schedule(). Still not
good, but at least this narrows the window and (perhaps) we can improve
this freezable_schedule() later.

But on a second thought... Probably cmd => try_to_freeze(); schedule();
should work. Or just

	#define __wait_event_freezable(wq, condition)	\
		__wait_event_interruptible(wq, ({ try_to_freeze(); (condition); }))

which looks simpler.

Oleg.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/7] wait: Reimplement wait_event_freezable()
  2014-10-31 21:38   ` Oleg Nesterov
@ 2014-10-31 21:48     ` Peter Zijlstra
  2014-10-31 21:53     ` Oleg Nesterov
  1 sibling, 0 replies; 15+ messages in thread
From: Peter Zijlstra @ 2014-10-31 21:48 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: mingo, linux-kernel, peter, rjw, torvalds, tglx, eparis,
	umgwanakikbuti, marcel, ebiederm, davem, fengguang.wu

On Fri, Oct 31, 2014 at 10:38:02PM +0100, Oleg Nesterov wrote:
> On 10/31, Peter Zijlstra wrote:
> >
> > Provide better implementations of wait_event_freezable() APIs.
> >
> > The problem is with freezer_do_not_count(), it hides the thread from
> > the freezer, even though this thread might not actually freeze/sleep
> > at all.
> 
> I agree, wait_event_freezable() is awful. But could you clarify "at all" ?
> 
> Sure, the task can be preempted right after it sets, it can do a lot
> of things before it calls schedule(), it can be woken after that and
> it can run again and do something else before freezer_count() calls
> try_to_freeze(), etc.
> 
> Is this what you meant?

Yes.

> > +#define __wait_event_freezable(wq, condition)				\
> > +	(void)___wait_event(wq, condition, TASK_INTERRUPTIBLE, 0, 0,	\
> > +			    schedule(); try_to_freeze())
> 
> I don't think this can work.

Yeah, that was horribly broken. defconfig didn't seem to find a usage
site. Wu's robot offered a .config quickly enough though ;-)

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/7] wait: Reimplement wait_event_freezable()
  2014-10-31 21:38   ` Oleg Nesterov
  2014-10-31 21:48     ` Peter Zijlstra
@ 2014-10-31 21:53     ` Oleg Nesterov
  1 sibling, 0 replies; 15+ messages in thread
From: Oleg Nesterov @ 2014-10-31 21:53 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: mingo, linux-kernel, peter, rjw, torvalds, tglx, eparis,
	umgwanakikbuti, marcel, ebiederm, davem, fengguang.wu

On 10/31, Oleg Nesterov wrote:
>
> On 10/31, Peter Zijlstra wrote:
> >
> > Provide better implementations of wait_event_freezable() APIs.
> >
> > The problem is with freezer_do_not_count(), it hides the thread from
> > the freezer, even though this thread might not actually freeze/sleep
> > at all.
>
> I agree, wait_event_freezable() is awful. But could you clarify "at all" ?
>
> Sure, the task can be preempted right after it sets, it can do a lot
> of things before it calls schedule(), it can be woken after that and
> it can run again and do something else before freezer_count() calls
> try_to_freeze(), etc.
>
> Is this what you meant?
>
>
> > +#define __wait_event_freezable(wq, condition)				\
> > +	(void)___wait_event(wq, condition, TASK_INTERRUPTIBLE, 0, 0,	\
> > +			    schedule(); try_to_freeze())
>
> I don't think this can work. wait_event_freezable() should be used by
> kernel threads and thus we can't rely on TIF_SIGPENDING, freeze_task()
> simply does wake_up_state(TASK_INTERRUPTIBLE) in this case.
>
> Just for example, suppose that try_to_freeze_tasks() calls freeze_task()
> before this kthread sets current->state = TASK_INTERRUPTIBLE. In this
> case __wait_event_freezable()->schedule() will happily sleep and
> try_to_freeze_tasks() will fail.
>
> That is why I tried to suggest cmd == freezable_schedule(). Still not
> good, but at least this narrows the window and (perhaps) we can improve
> this freezable_schedule() later.
>
> But on a second thought... Probably cmd => try_to_freeze(); schedule();
> should work. Or just
>
> 	#define __wait_event_freezable(wq, condition)	\
> 		__wait_event_interruptible(wq, ({ try_to_freeze(); (condition); }))
>
> which looks simpler.

Ah, but I forgot about another problem... can't we finally kill this ugly
"long save = current->state" code in __refrigerator() ? Nobody seem to
understand why we are doing this, and this looks just wrong.

And this doesn't allow us to do try_to_freeze() + schedule().

Oleg.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/7] sched,wait: Fix a kthread race with wait_woken()
  2014-10-31 11:10 ` [PATCH 1/7] sched,wait: Fix a kthread race with wait_woken() Peter Zijlstra
@ 2014-10-31 22:13   ` Oleg Nesterov
  0 siblings, 0 replies; 15+ messages in thread
From: Oleg Nesterov @ 2014-10-31 22:13 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: mingo, linux-kernel, peter, rjw, torvalds, tglx, eparis,
	umgwanakikbuti, marcel, ebiederm, davem, fengguang.wu

On 10/31, Peter Zijlstra wrote:
>
> There is a race between kthread_stop() and the new wait_woken() that
> can result in a lack of progress.

Likewise, the user of wait_woken() can miss any other event which is
not associated with wq we are going to sleep on. Please see below.

> +static inline bool is_kthread_should_stop(void)
> +{
> +	return (current->flags & PF_KTHREAD) && kthread_should_stop();
> +}
>
>  /*
>   * DEFINE_WAIT_FUNC(wait, woken_wake_func);
> @@ -326,7 +331,7 @@ long wait_woken(wait_queue_t *wait, unsi
>  	 * woken_wake_function() such that if we observe WQ_FLAG_WOKEN we must
>  	 * also observe all state before the wakeup.
>  	 */
> -	if (!(wait->flags & WQ_FLAG_WOKEN))
> +	if (!(wait->flags & WQ_FLAG_WOKEN) && !is_kthread_should_stop())
>  		timeout = schedule_timeout(timeout);

Well yes, this is more straightforward than other hacks we discussed before.
But see above, this doesn't look flexible enough.

And. This assumes that the user must also check kthread_should_stop(),
otherwise the waiting loop becomes a busy-wait loop.

So I won't argue, but I still think it would be better to allow the user to
do set_task_state() by hand if it needs to check the additional conditions.

Oleg.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 7/7] sched: Use WARN_ONCE for the might_sleep() TASK_RUNNING test
  2014-10-31 11:10 ` [PATCH 7/7] sched: Use WARN_ONCE for the might_sleep() TASK_RUNNING test Peter Zijlstra
@ 2014-10-31 22:42   ` Oleg Nesterov
  2014-11-03  7:56     ` Peter Zijlstra
  0 siblings, 1 reply; 15+ messages in thread
From: Oleg Nesterov @ 2014-10-31 22:42 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: mingo, linux-kernel, peter, rjw, torvalds, tglx, eparis,
	umgwanakikbuti, marcel, ebiederm, davem, fengguang.wu

On 10/31, Peter Zijlstra wrote:
>
> In some cases this can trigger a true flood of output.
>
> Requested-by: Ingo Molnar <mingo@kernel.org>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> ---
>  kernel/sched/core.c |    2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -7301,7 +7301,7 @@ void __might_sleep(const char *file, int
>  	 * since we will exit with TASK_RUNNING make sure we enter with it,
>  	 * otherwise we will destroy state.
>  	 */
> -	if (WARN(current->state != TASK_RUNNING,
> +	if (WARN_ONCE(current->state != TASK_RUNNING,

Agreed, but sorry for off-topic, can't resist.

Sometimes I hate WARN_ONCE() because you can't reproduce the problem
once again without reboot.

Perhaps WARN_ON_RATELIMIT() should be used more often (not sure about
this particular case). Or, perhaps, we can add a special section for
these "__warned" variables and add, say, sysctl which clears that
section ?

Oleg.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 7/7] sched: Use WARN_ONCE for the might_sleep() TASK_RUNNING test
  2014-10-31 22:42   ` Oleg Nesterov
@ 2014-11-03  7:56     ` Peter Zijlstra
  0 siblings, 0 replies; 15+ messages in thread
From: Peter Zijlstra @ 2014-11-03  7:56 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: mingo, linux-kernel, peter, rjw, torvalds, tglx, eparis,
	umgwanakikbuti, marcel, ebiederm, davem, fengguang.wu

On Fri, Oct 31, 2014 at 11:42:37PM +0100, Oleg Nesterov wrote:
> On 10/31, Peter Zijlstra wrote:
> >
> > In some cases this can trigger a true flood of output.
> >
> > Requested-by: Ingo Molnar <mingo@kernel.org>
> > Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> > ---
> >  kernel/sched/core.c |    2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > --- a/kernel/sched/core.c
> > +++ b/kernel/sched/core.c
> > @@ -7301,7 +7301,7 @@ void __might_sleep(const char *file, int
> >  	 * since we will exit with TASK_RUNNING make sure we enter with it,
> >  	 * otherwise we will destroy state.
> >  	 */
> > -	if (WARN(current->state != TASK_RUNNING,
> > +	if (WARN_ONCE(current->state != TASK_RUNNING,
> 
> Agreed, but sorry for off-topic, can't resist.
> 
> Sometimes I hate WARN_ONCE() because you can't reproduce the problem
> once again without reboot.

Yes, and the fact that you can only see the first fail, even if more are
present.

> Perhaps WARN_ON_RATELIMIT() should be used more often (not sure about
> this particular case). Or, perhaps, we can add a special section for
> these "__warned" variables and add, say, sysctl which clears that
> section ?

Yeah, maybe, /debug/warn_once/file/line/enable or whatnot. For now I'll
continue removing ONCEs whenever I feel like it though ;-)

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2014-11-03  7:57 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-10-31 11:10 [PATCH 0/7] Various fixes for nested sleep stuff Peter Zijlstra
2014-10-31 11:10 ` [PATCH 1/7] sched,wait: Fix a kthread race with wait_woken() Peter Zijlstra
2014-10-31 22:13   ` Oleg Nesterov
2014-10-31 11:10 ` [PATCH 2/7] wait: Reimplement wait_event_freezable() Peter Zijlstra
2014-10-31 17:41   ` Peter Zijlstra
2014-10-31 21:38   ` Oleg Nesterov
2014-10-31 21:48     ` Peter Zijlstra
2014-10-31 21:53     ` Oleg Nesterov
2014-10-31 11:10 ` [PATCH 3/7] wait: Remove wait_event_freezekillable() Peter Zijlstra
2014-10-31 11:10 ` [PATCH 4/7] audit,wait: Fixup kauditd_thread wait loop Peter Zijlstra
2014-10-31 11:10 ` [PATCH 5/7] rfcomm: Fix broken wait construct Peter Zijlstra
2014-10-31 11:10 ` [PATCH 6/7] netdev: Fix sleeping inside wait event Peter Zijlstra
2014-10-31 11:10 ` [PATCH 7/7] sched: Use WARN_ONCE for the might_sleep() TASK_RUNNING test Peter Zijlstra
2014-10-31 22:42   ` Oleg Nesterov
2014-11-03  7:56     ` Peter Zijlstra

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).