* [PATCH 0/7] Various fixes for nested sleep stuff
@ 2014-10-31 11:10 Peter Zijlstra
2014-10-31 11:10 ` [PATCH 1/7] sched,wait: Fix a kthread race with wait_woken() Peter Zijlstra
` (6 more replies)
0 siblings, 7 replies; 15+ messages in thread
From: Peter Zijlstra @ 2014-10-31 11:10 UTC (permalink / raw)
To: mingo, linux-kernel
Cc: peter, oleg, rjw, torvalds, tglx, eparis, umgwanakikbuti, marcel,
ebiederm, davem, fengguang.wu, Peter Zijlstra
Here's a few patches fixing reported nested sleep warns and issues.
^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH 1/7] sched,wait: Fix a kthread race with wait_woken()
2014-10-31 11:10 [PATCH 0/7] Various fixes for nested sleep stuff Peter Zijlstra
@ 2014-10-31 11:10 ` Peter Zijlstra
2014-10-31 22:13 ` Oleg Nesterov
2014-10-31 11:10 ` [PATCH 2/7] wait: Reimplement wait_event_freezable() Peter Zijlstra
` (5 subsequent siblings)
6 siblings, 1 reply; 15+ messages in thread
From: Peter Zijlstra @ 2014-10-31 11:10 UTC (permalink / raw)
To: mingo, linux-kernel
Cc: peter, oleg, rjw, torvalds, tglx, eparis, umgwanakikbuti, marcel,
ebiederm, davem, fengguang.wu, Peter Zijlstra
[-- Attachment #1: peterz-wait_woken-kthread.patch --]
[-- Type: text/plain, Size: 2015 bytes --]
There is a race between kthread_stop() and the new wait_woken() that
can result in a lack of progress.
CPU 0 | CPU 1
|
rfcomm_run() | kthread_stop()
... |
if (!test_bit(KTHREAD_SHOULD_STOP)) |
| set_bit(KTHREAD_SHOULD_STOP)
| wake_up_process()
wait_woken() | wait_for_completion()
set_current_state(INTERRUPTIBLE) |
if (!WQ_FLAG_WOKEN) |
schedule_timeout() |
|
After which both tasks will wait.. forever.
Fix this by having wait_woken() check for kthread_should_stop() but
only for kthreads (obviously).
Cc: Peter Hurley <peter@hurleysoftware.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
kernel/sched/wait.c | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)
--- a/kernel/sched/wait.c
+++ b/kernel/sched/wait.c
@@ -9,6 +9,7 @@
#include <linux/mm.h>
#include <linux/wait.h>
#include <linux/hash.h>
+#include <linux/kthread.h>
void __init_waitqueue_head(wait_queue_head_t *q, const char *name, struct lock_class_key *key)
{
@@ -297,6 +298,10 @@ int autoremove_wake_function(wait_queue_
}
EXPORT_SYMBOL(autoremove_wake_function);
+static inline bool is_kthread_should_stop(void)
+{
+ return (current->flags & PF_KTHREAD) && kthread_should_stop();
+}
/*
* DEFINE_WAIT_FUNC(wait, woken_wake_func);
@@ -326,7 +331,7 @@ long wait_woken(wait_queue_t *wait, unsi
* woken_wake_function() such that if we observe WQ_FLAG_WOKEN we must
* also observe all state before the wakeup.
*/
- if (!(wait->flags & WQ_FLAG_WOKEN))
+ if (!(wait->flags & WQ_FLAG_WOKEN) && !is_kthread_should_stop())
timeout = schedule_timeout(timeout);
__set_current_state(TASK_RUNNING);
^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH 2/7] wait: Reimplement wait_event_freezable()
2014-10-31 11:10 [PATCH 0/7] Various fixes for nested sleep stuff Peter Zijlstra
2014-10-31 11:10 ` [PATCH 1/7] sched,wait: Fix a kthread race with wait_woken() Peter Zijlstra
@ 2014-10-31 11:10 ` Peter Zijlstra
2014-10-31 17:41 ` Peter Zijlstra
2014-10-31 21:38 ` Oleg Nesterov
2014-10-31 11:10 ` [PATCH 3/7] wait: Remove wait_event_freezekillable() Peter Zijlstra
` (4 subsequent siblings)
6 siblings, 2 replies; 15+ messages in thread
From: Peter Zijlstra @ 2014-10-31 11:10 UTC (permalink / raw)
To: mingo, linux-kernel
Cc: peter, oleg, rjw, torvalds, tglx, eparis, umgwanakikbuti, marcel,
ebiederm, davem, fengguang.wu, Peter Zijlstra
[-- Attachment #1: peterz-wait-freezable.patch --]
[-- Type: text/plain, Size: 4763 bytes --]
Provide better implementations of wait_event_freezable() APIs.
The problem is with freezer_do_not_count(), it hides the thread from
the freezer, even though this thread might not actually freeze/sleep
at all.
Cc: oleg@redhat.com
Cc: Rafael Wysocki <rjw@rjwysocki.net>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
include/linux/freezer.h | 38 --------------------------------
include/linux/wait.h | 56 ++++++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 56 insertions(+), 38 deletions(-)
--- a/include/linux/freezer.h
+++ b/include/linux/freezer.h
@@ -265,35 +265,6 @@ static inline int freezable_schedule_hrt
__retval; \
})
-#define wait_event_freezable(wq, condition) \
-({ \
- int __retval; \
- freezer_do_not_count(); \
- __retval = wait_event_interruptible(wq, (condition)); \
- freezer_count(); \
- __retval; \
-})
-
-#define wait_event_freezable_timeout(wq, condition, timeout) \
-({ \
- long __retval = timeout; \
- freezer_do_not_count(); \
- __retval = wait_event_interruptible_timeout(wq, (condition), \
- __retval); \
- freezer_count(); \
- __retval; \
-})
-
-#define wait_event_freezable_exclusive(wq, condition) \
-({ \
- int __retval; \
- freezer_do_not_count(); \
- __retval = wait_event_interruptible_exclusive(wq, condition); \
- freezer_count(); \
- __retval; \
-})
-
-
#else /* !CONFIG_FREEZER */
static inline bool frozen(struct task_struct *p) { return false; }
static inline bool freezing(struct task_struct *p) { return false; }
@@ -331,15 +302,6 @@ static inline void set_freezable(void) {
#define freezable_schedule_hrtimeout_range(expires, delta, mode) \
schedule_hrtimeout_range(expires, delta, mode)
-#define wait_event_freezable(wq, condition) \
- wait_event_interruptible(wq, condition)
-
-#define wait_event_freezable_timeout(wq, condition, timeout) \
- wait_event_interruptible_timeout(wq, condition, timeout)
-
-#define wait_event_freezable_exclusive(wq, condition) \
- wait_event_interruptible_exclusive(wq, condition)
-
#define wait_event_freezekillable(wq, condition) \
wait_event_killable(wq, condition)
--- a/include/linux/wait.h
+++ b/include/linux/wait.h
@@ -267,6 +267,30 @@ do { \
__wait_event(wq, condition); \
} while (0)
+#define __wait_event_freezable(wq, condition) \
+ (void)___wait_event(wq, condition, TASK_INTERRUPTIBLE, 0, 0, \
+ schedule(); try_to_freeze())
+
+/**
+ * wait_event - sleep (or freeze) until a condition gets true
+ * @wq: the waitqueue to wait on
+ * @condition: a C expression for the event to wait for
+ *
+ * The process is put to sleep (TASK_INTERRUPTIBLE -- so as not to contribute
+ * to system load) until the @condition evaluates to true. The
+ * @condition is checked each time the waitqueue @wq is woken up.
+ *
+ * wake_up() has to be called after changing any variable that could
+ * change the result of the wait condition.
+ */
+#define wait_event_freezable(wq, condition) \
+do { \
+ might_sleep(); \
+ if (condition) \
+ break; \
+ __wait_event_freezable(wq, condition); \
+} while (0)
+
#define __wait_event_timeout(wq, condition, timeout) \
___wait_event(wq, ___wait_cond_timeout(condition), \
TASK_UNINTERRUPTIBLE, 0, timeout, \
@@ -300,6 +324,24 @@ do { \
__ret; \
})
+#define __wait_event_freezable_timeout(wq, condition, timeout) \
+ ___wait_event(wq, ___wait_cond_timeout(condition), \
+ TASK_INTERRUPTIBLE, 0, timeout, \
+ __ret = schedule_timeout(__ret); try_to_freeze())
+
+/*
+ * like wait_event_timeout() -- except it uses TASK_INTERRUPTIBLE to avoid
+ * increasing load and is freezable.
+ */
+#define wait_event_freezable_timeout(wq, condition, timeout) \
+({ \
+ long __ret = timeout; \
+ might_sleep(); \
+ if (!___wait_cond_timeout(condition)) \
+ __ret = __wait_event_freezable_timeout(wq, condition, timeout); \
+ __ret; \
+})
+
#define __wait_event_cmd(wq, condition, cmd1, cmd2) \
(void)___wait_event(wq, condition, TASK_UNINTERRUPTIBLE, 0, 0, \
cmd1; schedule(); cmd2)
@@ -479,6 +521,20 @@ do { \
__ret; \
})
+
+#define __wait_event_freezable_exclusive(wq, condition) \
+ ___wait_event(wq, condition, TASK_INTERRUPTIBLE, 1, 0, \
+ schedule(); try_to_freeze())
+
+#define wait_event_freezable_exclusive(wq, condition) \
+({ \
+ int __ret = 0; \
+ might_sleep(); \
+ if (!(condition)) \
+ __ret = __wait_event_freezable_exclusive(wq, condition);\
+ __ret; \
+})
+
#define __wait_event_interruptible_locked(wq, condition, exclusive, irq) \
({ \
^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH 3/7] wait: Remove wait_event_freezekillable()
2014-10-31 11:10 [PATCH 0/7] Various fixes for nested sleep stuff Peter Zijlstra
2014-10-31 11:10 ` [PATCH 1/7] sched,wait: Fix a kthread race with wait_woken() Peter Zijlstra
2014-10-31 11:10 ` [PATCH 2/7] wait: Reimplement wait_event_freezable() Peter Zijlstra
@ 2014-10-31 11:10 ` Peter Zijlstra
2014-10-31 11:10 ` [PATCH 4/7] audit,wait: Fixup kauditd_thread wait loop Peter Zijlstra
` (3 subsequent siblings)
6 siblings, 0 replies; 15+ messages in thread
From: Peter Zijlstra @ 2014-10-31 11:10 UTC (permalink / raw)
To: mingo, linux-kernel
Cc: peter, oleg, rjw, torvalds, tglx, eparis, umgwanakikbuti, marcel,
ebiederm, davem, fengguang.wu, Peter Zijlstra
[-- Attachment #1: peterz-wait-freezable-2.patch --]
[-- Type: text/plain, Size: 1132 bytes --]
There is no user.. make it go away.
Cc: oleg@redhat.com
Cc: Rafael Wysocki <rjw@rjwysocki.net>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
include/linux/freezer.h | 12 ------------
1 file changed, 12 deletions(-)
--- a/include/linux/freezer.h
+++ b/include/linux/freezer.h
@@ -246,15 +246,6 @@ static inline int freezable_schedule_hrt
* defined in <linux/wait.h>
*/
-#define wait_event_freezekillable(wq, condition) \
-({ \
- int __retval; \
- freezer_do_not_count(); \
- __retval = wait_event_killable(wq, (condition)); \
- freezer_count(); \
- __retval; \
-})
-
/* DO NOT ADD ANY NEW CALLERS OF THIS FUNCTION */
#define wait_event_freezekillable_unsafe(wq, condition) \
({ \
@@ -302,9 +293,6 @@ static inline void set_freezable(void) {
#define freezable_schedule_hrtimeout_range(expires, delta, mode) \
schedule_hrtimeout_range(expires, delta, mode)
-#define wait_event_freezekillable(wq, condition) \
- wait_event_killable(wq, condition)
-
#define wait_event_freezekillable_unsafe(wq, condition) \
wait_event_killable(wq, condition)
^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH 4/7] audit,wait: Fixup kauditd_thread wait loop
2014-10-31 11:10 [PATCH 0/7] Various fixes for nested sleep stuff Peter Zijlstra
` (2 preceding siblings ...)
2014-10-31 11:10 ` [PATCH 3/7] wait: Remove wait_event_freezekillable() Peter Zijlstra
@ 2014-10-31 11:10 ` Peter Zijlstra
2014-10-31 11:10 ` [PATCH 5/7] rfcomm: Fix broken wait construct Peter Zijlstra
` (2 subsequent siblings)
6 siblings, 0 replies; 15+ messages in thread
From: Peter Zijlstra @ 2014-10-31 11:10 UTC (permalink / raw)
To: mingo, linux-kernel
Cc: peter, oleg, rjw, torvalds, tglx, eparis, umgwanakikbuti, marcel,
ebiederm, davem, fengguang.wu, Peter Zijlstra
[-- Attachment #1: peter_zijlstra-nested_sleeps_fixes_and_debug_infrastructure.patch --]
[-- Type: text/plain, Size: 1522 bytes --]
The kauditd_thread wait loop is a bit iffy; it has a number of problems:
- calls try_to_freeze() before schedule(); you typically want the
thread to re-evaluate the sleep condition when unfreezing, also
freeze_task() issues a wakeup.
- it unconditionally does the {add,remove}_wait_queue(), even when the
sleep condition is false.
Use wait_event_freezable() that does the right thing.
Cc: mingo@kernel.org
Cc: oleg@redhat.com
Cc: torvalds@linux-foundation.org
Cc: tglx@linutronix.de
Cc: Eric Paris <eparis@redhat.com>
Reported-by: Mike Galbraith <umgwanakikbuti@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20141002102251.GA6324@worktop.programming.kicks-ass.net
---
kernel/audit.c | 11 +----------
1 file changed, 1 insertion(+), 10 deletions(-)
--- a/kernel/audit.c
+++ b/kernel/audit.c
@@ -499,7 +499,6 @@ static int kauditd_thread(void *dummy)
set_freezable();
while (!kthread_should_stop()) {
struct sk_buff *skb;
- DECLARE_WAITQUEUE(wait, current);
flush_hold_queue();
@@ -514,16 +513,8 @@ static int kauditd_thread(void *dummy)
audit_printk_skb(skb);
continue;
}
- set_current_state(TASK_INTERRUPTIBLE);
- add_wait_queue(&kauditd_wait, &wait);
- if (!skb_queue_len(&audit_skb_queue)) {
- try_to_freeze();
- schedule();
- }
-
- __set_current_state(TASK_RUNNING);
- remove_wait_queue(&kauditd_wait, &wait);
+ wait_event_freezable(kauditd_wait, skb_queue_len(&audit_skb_queue));
}
return 0;
}
^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH 5/7] rfcomm: Fix broken wait construct
2014-10-31 11:10 [PATCH 0/7] Various fixes for nested sleep stuff Peter Zijlstra
` (3 preceding siblings ...)
2014-10-31 11:10 ` [PATCH 4/7] audit,wait: Fixup kauditd_thread wait loop Peter Zijlstra
@ 2014-10-31 11:10 ` Peter Zijlstra
2014-10-31 11:10 ` [PATCH 6/7] netdev: Fix sleeping inside wait event Peter Zijlstra
2014-10-31 11:10 ` [PATCH 7/7] sched: Use WARN_ONCE for the might_sleep() TASK_RUNNING test Peter Zijlstra
6 siblings, 0 replies; 15+ messages in thread
From: Peter Zijlstra @ 2014-10-31 11:10 UTC (permalink / raw)
To: mingo, linux-kernel
Cc: peter, oleg, rjw, torvalds, tglx, eparis, umgwanakikbuti, marcel,
ebiederm, davem, fengguang.wu, Peter Zijlstra
[-- Attachment #1: peterz-fix-rfcomm.patch --]
[-- Type: text/plain, Size: 1539 bytes --]
rfcomm_run() is a tad broken in that is has a nested wait loop. One
cannot rely on p->state for the outer wait because the inner wait will
overwrite it.
Cc: Marcel Holtmann <marcel@holtmann.org>
Cc: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
net/bluetooth/rfcomm/core.c | 18 ++++++++----------
1 file changed, 8 insertions(+), 10 deletions(-)
--- a/net/bluetooth/rfcomm/core.c
+++ b/net/bluetooth/rfcomm/core.c
@@ -101,11 +101,11 @@ static struct rfcomm_session *rfcomm_ses
#define __get_rpn_stop_bits(line) (((line) >> 2) & 0x1)
#define __get_rpn_parity(line) (((line) >> 3) & 0x7)
+static DECLARE_WAIT_QUEUE_HEAD(rfcomm_wq);
+
static void rfcomm_schedule(void)
{
- if (!rfcomm_thread)
- return;
- wake_up_process(rfcomm_thread);
+ wake_up_all(&rfcomm_wq);
}
/* ---- RFCOMM FCS computation ---- */
@@ -2086,24 +2086,22 @@ static void rfcomm_kill_listener(void)
static int rfcomm_run(void *unused)
{
+ DEFINE_WAIT_FUNC(wait, woken_wake_function);
BT_DBG("");
set_user_nice(current, -10);
rfcomm_add_listener(BDADDR_ANY);
- while (1) {
- set_current_state(TASK_INTERRUPTIBLE);
-
- if (kthread_should_stop())
- break;
+ add_wait_queue(&rfcomm_wq, &wait);
+ while (!kthread_should_stop()) {
/* Process stuff */
rfcomm_process_sessions();
- schedule();
+ wait_woken(&wait, TASK_INTERRUPTIBLE, MAX_SCHEDULE_TIMEOUT);
}
- __set_current_state(TASK_RUNNING);
+ remove_wait_queue(&rfcomm_wq, &wait);
rfcomm_kill_listener();
^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH 6/7] netdev: Fix sleeping inside wait event
2014-10-31 11:10 [PATCH 0/7] Various fixes for nested sleep stuff Peter Zijlstra
` (4 preceding siblings ...)
2014-10-31 11:10 ` [PATCH 5/7] rfcomm: Fix broken wait construct Peter Zijlstra
@ 2014-10-31 11:10 ` Peter Zijlstra
2014-10-31 11:10 ` [PATCH 7/7] sched: Use WARN_ONCE for the might_sleep() TASK_RUNNING test Peter Zijlstra
6 siblings, 0 replies; 15+ messages in thread
From: Peter Zijlstra @ 2014-10-31 11:10 UTC (permalink / raw)
To: mingo, linux-kernel
Cc: peter, oleg, rjw, torvalds, tglx, eparis, umgwanakikbuti, marcel,
ebiederm, davem, fengguang.wu, Peter Zijlstra
[-- Attachment #1: peterz-fix-netdev-device_exit_batch.patch --]
[-- Type: text/plain, Size: 2333 bytes --]
rtnl_lock_unregistering*() take rtnl_lock() -- a mutex -- inside a
wait loop. The wait loop relies on current->state to function, but so
does mutex_lock(), nesting them makes for the inner to destroy the
outer state.
Fix this using the new wait_woken() bits.
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Eric Biederman <ebiederm@xmission.com>
Acked-by: David S. Miller <davem@davemloft.net>
Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20141029173110.GE15602@worktop.programming.kicks-ass.net
---
net/core/dev.c | 10 +++++-----
net/core/rtnetlink.c | 10 +++++-----
2 files changed, 10 insertions(+), 10 deletions(-)
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -7196,11 +7196,10 @@ static void __net_exit rtnl_lock_unregis
*/
struct net *net;
bool unregistering;
- DEFINE_WAIT(wait);
+ DEFINE_WAIT_FUNC(wait, woken_wake_function);
+ add_wait_queue(&netdev_unregistering_wq, &wait);
for (;;) {
- prepare_to_wait(&netdev_unregistering_wq, &wait,
- TASK_UNINTERRUPTIBLE);
unregistering = false;
rtnl_lock();
list_for_each_entry(net, net_list, exit_list) {
@@ -7212,9 +7211,10 @@ static void __net_exit rtnl_lock_unregis
if (!unregistering)
break;
__rtnl_unlock();
- schedule();
+
+ wait_woken(&wait, TASK_UNINTERRUPTIBLE, MAX_SCHEDULE_TIMEOUT);
}
- finish_wait(&netdev_unregistering_wq, &wait);
+ remove_wait_queue(&netdev_unregistering_wq, &wait);
}
static void __net_exit default_device_exit_batch(struct list_head *net_list)
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -365,11 +365,10 @@ static void rtnl_lock_unregistering_all(
{
struct net *net;
bool unregistering;
- DEFINE_WAIT(wait);
+ DEFINE_WAIT_FUNC(wait, woken_wake_function);
+ add_wait_queue(&netdev_unregistering_wq, &wait);
for (;;) {
- prepare_to_wait(&netdev_unregistering_wq, &wait,
- TASK_UNINTERRUPTIBLE);
unregistering = false;
rtnl_lock();
for_each_net(net) {
@@ -381,9 +380,10 @@ static void rtnl_lock_unregistering_all(
if (!unregistering)
break;
__rtnl_unlock();
- schedule();
+
+ wait_woken(&wait, TASK_UNINTERRUPTIBLE, MAX_SCHEDULE_TIMEOUT);
}
- finish_wait(&netdev_unregistering_wq, &wait);
+ remove_wait_queue(&netdev_unregistering_wq, &wait);
}
/**
^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH 7/7] sched: Use WARN_ONCE for the might_sleep() TASK_RUNNING test
2014-10-31 11:10 [PATCH 0/7] Various fixes for nested sleep stuff Peter Zijlstra
` (5 preceding siblings ...)
2014-10-31 11:10 ` [PATCH 6/7] netdev: Fix sleeping inside wait event Peter Zijlstra
@ 2014-10-31 11:10 ` Peter Zijlstra
2014-10-31 22:42 ` Oleg Nesterov
6 siblings, 1 reply; 15+ messages in thread
From: Peter Zijlstra @ 2014-10-31 11:10 UTC (permalink / raw)
To: mingo, linux-kernel
Cc: peter, oleg, rjw, torvalds, tglx, eparis, umgwanakikbuti, marcel,
ebiederm, davem, fengguang.wu, Peter Zijlstra
[-- Attachment #1: peterz-wait-warn.patch --]
[-- Type: text/plain, Size: 676 bytes --]
In some cases this can trigger a true flood of output.
Requested-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
kernel/sched/core.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -7301,7 +7301,7 @@ void __might_sleep(const char *file, int
* since we will exit with TASK_RUNNING make sure we enter with it,
* otherwise we will destroy state.
*/
- if (WARN(current->state != TASK_RUNNING,
+ if (WARN_ONCE(current->state != TASK_RUNNING,
"do not call blocking ops when !TASK_RUNNING; "
"state=%lx set at [<%p>] %pS\n",
current->state,
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 2/7] wait: Reimplement wait_event_freezable()
2014-10-31 11:10 ` [PATCH 2/7] wait: Reimplement wait_event_freezable() Peter Zijlstra
@ 2014-10-31 17:41 ` Peter Zijlstra
2014-10-31 21:38 ` Oleg Nesterov
1 sibling, 0 replies; 15+ messages in thread
From: Peter Zijlstra @ 2014-10-31 17:41 UTC (permalink / raw)
To: mingo, linux-kernel
Cc: peter, oleg, rjw, torvalds, tglx, eparis, umgwanakikbuti, marcel,
ebiederm, davem, fengguang.wu
On Fri, Oct 31, 2014 at 12:10:39PM +0100, Peter Zijlstra wrote:
> Provide better implementations of wait_event_freezable() APIs.
>
> The problem is with freezer_do_not_count(), it hides the thread from
> the freezer, even though this thread might not actually freeze/sleep
> at all.
>
> Cc: oleg@redhat.com
> Cc: Rafael Wysocki <rjw@rjwysocki.net>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> ---
This would maybe compile better.. gotta run, will test later.
--- a/include/linux/wait.h
+++ b/include/linux/wait.h
@@ -284,12 +284,13 @@ do { \
* change the result of the wait condition.
*/
#define wait_event_freezable(wq, condition) \
-do { \
+({ \
+ int __ret = 0; \
might_sleep(); \
- if (condition) \
- break; \
- __wait_event_freezable(wq, condition); \
-} while (0)
+ if (!(condition)) \
+ __ret = __wait_event_freezable(wq, condition); \
+ __ret; \
+})
#define __wait_event_timeout(wq, condition, timeout) \
___wait_event(wq, ___wait_cond_timeout(condition), \
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 2/7] wait: Reimplement wait_event_freezable()
2014-10-31 11:10 ` [PATCH 2/7] wait: Reimplement wait_event_freezable() Peter Zijlstra
2014-10-31 17:41 ` Peter Zijlstra
@ 2014-10-31 21:38 ` Oleg Nesterov
2014-10-31 21:48 ` Peter Zijlstra
2014-10-31 21:53 ` Oleg Nesterov
1 sibling, 2 replies; 15+ messages in thread
From: Oleg Nesterov @ 2014-10-31 21:38 UTC (permalink / raw)
To: Peter Zijlstra
Cc: mingo, linux-kernel, peter, rjw, torvalds, tglx, eparis,
umgwanakikbuti, marcel, ebiederm, davem, fengguang.wu
On 10/31, Peter Zijlstra wrote:
>
> Provide better implementations of wait_event_freezable() APIs.
>
> The problem is with freezer_do_not_count(), it hides the thread from
> the freezer, even though this thread might not actually freeze/sleep
> at all.
I agree, wait_event_freezable() is awful. But could you clarify "at all" ?
Sure, the task can be preempted right after it sets, it can do a lot
of things before it calls schedule(), it can be woken after that and
it can run again and do something else before freezer_count() calls
try_to_freeze(), etc.
Is this what you meant?
> +#define __wait_event_freezable(wq, condition) \
> + (void)___wait_event(wq, condition, TASK_INTERRUPTIBLE, 0, 0, \
> + schedule(); try_to_freeze())
I don't think this can work. wait_event_freezable() should be used by
kernel threads and thus we can't rely on TIF_SIGPENDING, freeze_task()
simply does wake_up_state(TASK_INTERRUPTIBLE) in this case.
Just for example, suppose that try_to_freeze_tasks() calls freeze_task()
before this kthread sets current->state = TASK_INTERRUPTIBLE. In this
case __wait_event_freezable()->schedule() will happily sleep and
try_to_freeze_tasks() will fail.
That is why I tried to suggest cmd == freezable_schedule(). Still not
good, but at least this narrows the window and (perhaps) we can improve
this freezable_schedule() later.
But on a second thought... Probably cmd => try_to_freeze(); schedule();
should work. Or just
#define __wait_event_freezable(wq, condition) \
__wait_event_interruptible(wq, ({ try_to_freeze(); (condition); }))
which looks simpler.
Oleg.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 2/7] wait: Reimplement wait_event_freezable()
2014-10-31 21:38 ` Oleg Nesterov
@ 2014-10-31 21:48 ` Peter Zijlstra
2014-10-31 21:53 ` Oleg Nesterov
1 sibling, 0 replies; 15+ messages in thread
From: Peter Zijlstra @ 2014-10-31 21:48 UTC (permalink / raw)
To: Oleg Nesterov
Cc: mingo, linux-kernel, peter, rjw, torvalds, tglx, eparis,
umgwanakikbuti, marcel, ebiederm, davem, fengguang.wu
On Fri, Oct 31, 2014 at 10:38:02PM +0100, Oleg Nesterov wrote:
> On 10/31, Peter Zijlstra wrote:
> >
> > Provide better implementations of wait_event_freezable() APIs.
> >
> > The problem is with freezer_do_not_count(), it hides the thread from
> > the freezer, even though this thread might not actually freeze/sleep
> > at all.
>
> I agree, wait_event_freezable() is awful. But could you clarify "at all" ?
>
> Sure, the task can be preempted right after it sets, it can do a lot
> of things before it calls schedule(), it can be woken after that and
> it can run again and do something else before freezer_count() calls
> try_to_freeze(), etc.
>
> Is this what you meant?
Yes.
> > +#define __wait_event_freezable(wq, condition) \
> > + (void)___wait_event(wq, condition, TASK_INTERRUPTIBLE, 0, 0, \
> > + schedule(); try_to_freeze())
>
> I don't think this can work.
Yeah, that was horribly broken. defconfig didn't seem to find a usage
site. Wu's robot offered a .config quickly enough though ;-)
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 2/7] wait: Reimplement wait_event_freezable()
2014-10-31 21:38 ` Oleg Nesterov
2014-10-31 21:48 ` Peter Zijlstra
@ 2014-10-31 21:53 ` Oleg Nesterov
1 sibling, 0 replies; 15+ messages in thread
From: Oleg Nesterov @ 2014-10-31 21:53 UTC (permalink / raw)
To: Peter Zijlstra
Cc: mingo, linux-kernel, peter, rjw, torvalds, tglx, eparis,
umgwanakikbuti, marcel, ebiederm, davem, fengguang.wu
On 10/31, Oleg Nesterov wrote:
>
> On 10/31, Peter Zijlstra wrote:
> >
> > Provide better implementations of wait_event_freezable() APIs.
> >
> > The problem is with freezer_do_not_count(), it hides the thread from
> > the freezer, even though this thread might not actually freeze/sleep
> > at all.
>
> I agree, wait_event_freezable() is awful. But could you clarify "at all" ?
>
> Sure, the task can be preempted right after it sets, it can do a lot
> of things before it calls schedule(), it can be woken after that and
> it can run again and do something else before freezer_count() calls
> try_to_freeze(), etc.
>
> Is this what you meant?
>
>
> > +#define __wait_event_freezable(wq, condition) \
> > + (void)___wait_event(wq, condition, TASK_INTERRUPTIBLE, 0, 0, \
> > + schedule(); try_to_freeze())
>
> I don't think this can work. wait_event_freezable() should be used by
> kernel threads and thus we can't rely on TIF_SIGPENDING, freeze_task()
> simply does wake_up_state(TASK_INTERRUPTIBLE) in this case.
>
> Just for example, suppose that try_to_freeze_tasks() calls freeze_task()
> before this kthread sets current->state = TASK_INTERRUPTIBLE. In this
> case __wait_event_freezable()->schedule() will happily sleep and
> try_to_freeze_tasks() will fail.
>
> That is why I tried to suggest cmd == freezable_schedule(). Still not
> good, but at least this narrows the window and (perhaps) we can improve
> this freezable_schedule() later.
>
> But on a second thought... Probably cmd => try_to_freeze(); schedule();
> should work. Or just
>
> #define __wait_event_freezable(wq, condition) \
> __wait_event_interruptible(wq, ({ try_to_freeze(); (condition); }))
>
> which looks simpler.
Ah, but I forgot about another problem... can't we finally kill this ugly
"long save = current->state" code in __refrigerator() ? Nobody seem to
understand why we are doing this, and this looks just wrong.
And this doesn't allow us to do try_to_freeze() + schedule().
Oleg.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 1/7] sched,wait: Fix a kthread race with wait_woken()
2014-10-31 11:10 ` [PATCH 1/7] sched,wait: Fix a kthread race with wait_woken() Peter Zijlstra
@ 2014-10-31 22:13 ` Oleg Nesterov
0 siblings, 0 replies; 15+ messages in thread
From: Oleg Nesterov @ 2014-10-31 22:13 UTC (permalink / raw)
To: Peter Zijlstra
Cc: mingo, linux-kernel, peter, rjw, torvalds, tglx, eparis,
umgwanakikbuti, marcel, ebiederm, davem, fengguang.wu
On 10/31, Peter Zijlstra wrote:
>
> There is a race between kthread_stop() and the new wait_woken() that
> can result in a lack of progress.
Likewise, the user of wait_woken() can miss any other event which is
not associated with wq we are going to sleep on. Please see below.
> +static inline bool is_kthread_should_stop(void)
> +{
> + return (current->flags & PF_KTHREAD) && kthread_should_stop();
> +}
>
> /*
> * DEFINE_WAIT_FUNC(wait, woken_wake_func);
> @@ -326,7 +331,7 @@ long wait_woken(wait_queue_t *wait, unsi
> * woken_wake_function() such that if we observe WQ_FLAG_WOKEN we must
> * also observe all state before the wakeup.
> */
> - if (!(wait->flags & WQ_FLAG_WOKEN))
> + if (!(wait->flags & WQ_FLAG_WOKEN) && !is_kthread_should_stop())
> timeout = schedule_timeout(timeout);
Well yes, this is more straightforward than other hacks we discussed before.
But see above, this doesn't look flexible enough.
And. This assumes that the user must also check kthread_should_stop(),
otherwise the waiting loop becomes a busy-wait loop.
So I won't argue, but I still think it would be better to allow the user to
do set_task_state() by hand if it needs to check the additional conditions.
Oleg.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 7/7] sched: Use WARN_ONCE for the might_sleep() TASK_RUNNING test
2014-10-31 11:10 ` [PATCH 7/7] sched: Use WARN_ONCE for the might_sleep() TASK_RUNNING test Peter Zijlstra
@ 2014-10-31 22:42 ` Oleg Nesterov
2014-11-03 7:56 ` Peter Zijlstra
0 siblings, 1 reply; 15+ messages in thread
From: Oleg Nesterov @ 2014-10-31 22:42 UTC (permalink / raw)
To: Peter Zijlstra
Cc: mingo, linux-kernel, peter, rjw, torvalds, tglx, eparis,
umgwanakikbuti, marcel, ebiederm, davem, fengguang.wu
On 10/31, Peter Zijlstra wrote:
>
> In some cases this can trigger a true flood of output.
>
> Requested-by: Ingo Molnar <mingo@kernel.org>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> ---
> kernel/sched/core.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -7301,7 +7301,7 @@ void __might_sleep(const char *file, int
> * since we will exit with TASK_RUNNING make sure we enter with it,
> * otherwise we will destroy state.
> */
> - if (WARN(current->state != TASK_RUNNING,
> + if (WARN_ONCE(current->state != TASK_RUNNING,
Agreed, but sorry for off-topic, can't resist.
Sometimes I hate WARN_ONCE() because you can't reproduce the problem
once again without reboot.
Perhaps WARN_ON_RATELIMIT() should be used more often (not sure about
this particular case). Or, perhaps, we can add a special section for
these "__warned" variables and add, say, sysctl which clears that
section ?
Oleg.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 7/7] sched: Use WARN_ONCE for the might_sleep() TASK_RUNNING test
2014-10-31 22:42 ` Oleg Nesterov
@ 2014-11-03 7:56 ` Peter Zijlstra
0 siblings, 0 replies; 15+ messages in thread
From: Peter Zijlstra @ 2014-11-03 7:56 UTC (permalink / raw)
To: Oleg Nesterov
Cc: mingo, linux-kernel, peter, rjw, torvalds, tglx, eparis,
umgwanakikbuti, marcel, ebiederm, davem, fengguang.wu
On Fri, Oct 31, 2014 at 11:42:37PM +0100, Oleg Nesterov wrote:
> On 10/31, Peter Zijlstra wrote:
> >
> > In some cases this can trigger a true flood of output.
> >
> > Requested-by: Ingo Molnar <mingo@kernel.org>
> > Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> > ---
> > kernel/sched/core.c | 2 +-
> > 1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > --- a/kernel/sched/core.c
> > +++ b/kernel/sched/core.c
> > @@ -7301,7 +7301,7 @@ void __might_sleep(const char *file, int
> > * since we will exit with TASK_RUNNING make sure we enter with it,
> > * otherwise we will destroy state.
> > */
> > - if (WARN(current->state != TASK_RUNNING,
> > + if (WARN_ONCE(current->state != TASK_RUNNING,
>
> Agreed, but sorry for off-topic, can't resist.
>
> Sometimes I hate WARN_ONCE() because you can't reproduce the problem
> once again without reboot.
Yes, and the fact that you can only see the first fail, even if more are
present.
> Perhaps WARN_ON_RATELIMIT() should be used more often (not sure about
> this particular case). Or, perhaps, we can add a special section for
> these "__warned" variables and add, say, sysctl which clears that
> section ?
Yeah, maybe, /debug/warn_once/file/line/enable or whatnot. For now I'll
continue removing ONCEs whenever I feel like it though ;-)
^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2014-11-03 7:57 UTC | newest]
Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-10-31 11:10 [PATCH 0/7] Various fixes for nested sleep stuff Peter Zijlstra
2014-10-31 11:10 ` [PATCH 1/7] sched,wait: Fix a kthread race with wait_woken() Peter Zijlstra
2014-10-31 22:13 ` Oleg Nesterov
2014-10-31 11:10 ` [PATCH 2/7] wait: Reimplement wait_event_freezable() Peter Zijlstra
2014-10-31 17:41 ` Peter Zijlstra
2014-10-31 21:38 ` Oleg Nesterov
2014-10-31 21:48 ` Peter Zijlstra
2014-10-31 21:53 ` Oleg Nesterov
2014-10-31 11:10 ` [PATCH 3/7] wait: Remove wait_event_freezekillable() Peter Zijlstra
2014-10-31 11:10 ` [PATCH 4/7] audit,wait: Fixup kauditd_thread wait loop Peter Zijlstra
2014-10-31 11:10 ` [PATCH 5/7] rfcomm: Fix broken wait construct Peter Zijlstra
2014-10-31 11:10 ` [PATCH 6/7] netdev: Fix sleeping inside wait event Peter Zijlstra
2014-10-31 11:10 ` [PATCH 7/7] sched: Use WARN_ONCE for the might_sleep() TASK_RUNNING test Peter Zijlstra
2014-10-31 22:42 ` Oleg Nesterov
2014-11-03 7:56 ` Peter Zijlstra
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).