linux-rt-users.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH RT v2] x86/mce: Defer mce wakeups to threads for PREEMPT_RT
@ 2013-04-11 18:33 Steven Rostedt
  2013-04-12 13:31 ` Borislav Petkov
                   ` (4 more replies)
  0 siblings, 5 replies; 14+ messages in thread
From: Steven Rostedt @ 2013-04-11 18:33 UTC (permalink / raw)
  To: LKML, RT
  Cc: Thomas Gleixner, Clark Williams, John Kacur, Tony Luck,
	Borislav Petkov, Mauro Carvalho Chehab, Ingo Molnar,
	H. Peter Anvin

[ This time without the hitchhiking process_32.c cruff on the patch ]

We had a customer report a lockup on a 3.0-rt kernel that had the
following backtrace:

[ffff88107fca3e80] rt_spin_lock_slowlock at ffffffff81499113
[ffff88107fca3f40] rt_spin_lock at ffffffff81499a56
[ffff88107fca3f50] __wake_up at ffffffff81043379
[ffff88107fca3f80] mce_notify_irq at ffffffff81017328
[ffff88107fca3f90] intel_threshold_interrupt at ffffffff81019508
[ffff88107fca3fa0] smp_threshold_interrupt at ffffffff81019fc1
[ffff88107fca3fb0] threshold_interrupt at ffffffff814a1853


It actually bugged because the lock was taken by the same owner that
already had that lock. What happened was the thread that was setting
itself on a wait queue had the lock when an MCE triggered. The MCE
interrupt does a wake up on its wait list and grabs the same lock.

NOTE: THIS IS NOT A BUG ON MAINLINE

Sorry for yelling, but as I Cc'd mainline maintainers I want them to
know that this is an PREEMPT_RT bug only. I only Cc'd them for advice.

On PREEMPT_RT the wait queue locks are converted from normal
"spin_locks" into an rt_mutex (see the rt_spin_lock_slowlock above).
These are not to be taken by hard interrupt context. This usually isn't
a problem as most all interrupts in PREEMPT_RT are converted into
schedulable threads. Unfortunately that's not the case with the MCE irq.

As wait queue locks are notorious for long hold times, we can not
convert them to raw_spin_locks without causing issues with -rt. But
Thomas has created a "simple-wait" structure that uses raw spin locks
which may have been a good fit.

Unfortunately, wait queues are not the only issue, as the mce_notify_irq
also does a schedule_work(), which grabs the workqueue spin locks that
have the exact same issue.

Thus, this patch I'm proposing is to move the actual work of the MCE
interrupt into a helper thread that gets woken up on the MCE interrupt
and does the work in a schedulable context.

NOTE: THIS PATCH ONLY CHANGES THE BEHAVIOR WHEN PREEMPT_RT IS SET

Oops, sorry for yelling again, but I want to stress that I keep the same
behavior of mainline when PREEMPT_RT is not set. Thus, this only changes
the MCE behavior when PREEMPT_RT is configured.

Comments?

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>

diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
index e8d8ad0..060e473 100644
--- a/arch/x86/kernel/cpu/mcheck/mce.c
+++ b/arch/x86/kernel/cpu/mcheck/mce.c
@@ -18,6 +18,7 @@
 #include <linux/rcupdate.h>
 #include <linux/kobject.h>
 #include <linux/uaccess.h>
+#include <linux/kthread.h>
 #include <linux/kdebug.h>
 #include <linux/kernel.h>
 #include <linux/percpu.h>
@@ -1308,6 +1309,61 @@ static void mce_do_trigger(struct work_struct *work)
 
 static DECLARE_WORK(mce_trigger_work, mce_do_trigger);
 
+static void __mce_notify_work(void)
+{
+	/* Not more than two messages every minute */
+	static DEFINE_RATELIMIT_STATE(ratelimit, 60*HZ, 2);
+
+	/* wake processes polling /dev/mcelog */
+	wake_up_interruptible(&mce_chrdev_wait);
+
+	/*
+	 * There is no risk of missing notifications because
+	 * work_pending is always cleared before the function is
+	 * executed.
+	 */
+	if (mce_helper[0] && !work_pending(&mce_trigger_work))
+		schedule_work(&mce_trigger_work);
+
+	if (__ratelimit(&ratelimit))
+		pr_info(HW_ERR "Machine check events logged\n");
+}
+
+#ifdef CONFIG_PREEMPT_RT_FULL
+struct task_struct *mce_notify_helper;
+
+static int mce_notify_helper_thread(void *unused)
+{
+	while (!kthread_should_stop()) {
+		__mce_notify_work();
+		set_current_state(TASK_INTERRUPTIBLE);
+		schedule();
+	}
+	return 0;
+}
+
+static int mce_notify_work_init(void)
+{
+	mce_notify_helper = kthread_create(mce_notify_helper_thread, NULL,
+					   "mce-notify");
+	if (!mce_notify_helper)
+		return -ENOMEM;
+
+	return 0;
+}
+
+static void mce_notify_work()
+{
+	wake_up_process(mce_notify_helper);
+}
+#else
+static void mce_notify_work(void)
+{
+	__mce_notify_work();
+}
+static inline int mce_notify_work_init(void) { return 0; }
+#endif
+
 /*
  * Notify the user(s) about new machine check events.
  * Can be called from interrupt context, but not from machine check/NMI
@@ -1315,24 +1371,8 @@ static DECLARE_WORK(mce_trigger_work, mce_do_trigger);
  */
 int mce_notify_irq(void)
 {
-	/* Not more than two messages every minute */
-	static DEFINE_RATELIMIT_STATE(ratelimit, 60*HZ, 2);
-
 	if (test_and_clear_bit(0, &mce_need_notify)) {
-		/* wake processes polling /dev/mcelog */
-		wake_up_interruptible(&mce_chrdev_wait);
-
-		/*
-		 * There is no risk of missing notifications because
-		 * work_pending is always cleared before the function is
-		 * executed.
-		 */
-		if (mce_helper[0] && !work_pending(&mce_trigger_work))
-			schedule_work(&mce_trigger_work);
-
-		if (__ratelimit(&ratelimit))
-			pr_info(HW_ERR "Machine check events logged\n");
-
+		mce_notify_work();
 		return 1;
 	}
 	return 0;
@@ -2375,6 +2415,8 @@ static __init int mcheck_init_device(void)
 	/* register character device /dev/mcelog */
 	misc_register(&mce_chrdev_device);
 
+	err = mce_notify_work_init();
+
 	return err;
 }
 device_initcall_sync(mcheck_init_device);

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH RT v2] x86/mce: Defer mce wakeups to threads for PREEMPT_RT
  2013-04-11 18:33 [PATCH RT v2] x86/mce: Defer mce wakeups to threads for PREEMPT_RT Steven Rostedt
@ 2013-04-12 13:31 ` Borislav Petkov
  2013-04-12 13:42   ` Steven Rostedt
  2013-04-12 13:38 ` Borislav Petkov
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 14+ messages in thread
From: Borislav Petkov @ 2013-04-12 13:31 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: LKML, RT, Thomas Gleixner, Clark Williams, John Kacur, Tony Luck,
	Mauro Carvalho Chehab, Ingo Molnar, H. Peter Anvin

On Thu, Apr 11, 2013 at 02:33:34PM -0400, Steven Rostedt wrote:
> Comments?

I don't have a problem with it except with the yelling - I think I've
gone deaf. And for Rostedt code it is pretty clean and even *I* can
understand it :-).

Tony?

Steve, stupid question: since this is -rt only, why even push it
upstream?

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH RT v2] x86/mce: Defer mce wakeups to threads for PREEMPT_RT
  2013-04-11 18:33 [PATCH RT v2] x86/mce: Defer mce wakeups to threads for PREEMPT_RT Steven Rostedt
  2013-04-12 13:31 ` Borislav Petkov
@ 2013-04-12 13:38 ` Borislav Petkov
  2013-04-12 13:44   ` Steven Rostedt
  2013-04-25 16:44 ` Sebastian Andrzej Siewior
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 14+ messages in thread
From: Borislav Petkov @ 2013-04-12 13:38 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: LKML, RT, Thomas Gleixner, Clark Williams, John Kacur, Tony Luck,
	Mauro Carvalho Chehab, Ingo Molnar, H. Peter Anvin

On Thu, Apr 11, 2013 at 02:33:34PM -0400, Steven Rostedt wrote:
> diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
> index e8d8ad0..060e473 100644
> --- a/arch/x86/kernel/cpu/mcheck/mce.c
> +++ b/arch/x86/kernel/cpu/mcheck/mce.c
> @@ -18,6 +18,7 @@
>  #include <linux/rcupdate.h>
>  #include <linux/kobject.h>
>  #include <linux/uaccess.h>
> +#include <linux/kthread.h>
>  #include <linux/kdebug.h>
>  #include <linux/kernel.h>
>  #include <linux/percpu.h>
> @@ -1308,6 +1309,61 @@ static void mce_do_trigger(struct work_struct *work)
>  
>  static DECLARE_WORK(mce_trigger_work, mce_do_trigger);
>  
> +static void __mce_notify_work(void)
> +{
> +	/* Not more than two messages every minute */
> +	static DEFINE_RATELIMIT_STATE(ratelimit, 60*HZ, 2);
> +
> +	/* wake processes polling /dev/mcelog */
> +	wake_up_interruptible(&mce_chrdev_wait);
> +
> +	/*
> +	 * There is no risk of missing notifications because
> +	 * work_pending is always cleared before the function is
> +	 * executed.
> +	 */

You must be using some tree != mainline because this comment is gone
upstream and that whole second hunk below which moves it here doesn't
apply.

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH RT v2] x86/mce: Defer mce wakeups to threads for PREEMPT_RT
  2013-04-12 13:31 ` Borislav Petkov
@ 2013-04-12 13:42   ` Steven Rostedt
  2013-04-12 21:44     ` Luck, Tony
  0 siblings, 1 reply; 14+ messages in thread
From: Steven Rostedt @ 2013-04-12 13:42 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: LKML, RT, Thomas Gleixner, Clark Williams, John Kacur, Tony Luck,
	Mauro Carvalho Chehab, Ingo Molnar, H. Peter Anvin

On Fri, 2013-04-12 at 15:31 +0200, Borislav Petkov wrote:
> On Thu, Apr 11, 2013 at 02:33:34PM -0400, Steven Rostedt wrote:
> > Comments?
> 
> I don't have a problem with it except with the yelling - I think I've
> gone deaf. And for Rostedt code it is pretty clean and even *I* can
> understand it :-).
> 
> Tony?
> 
> Steve, stupid question: since this is -rt only, why even push it
> upstream?
> 

I'm not (yet). But I just wanted to make sure there wasn't any little
subtleties that I might be missing.

When -rt goes mainline, code like this will be going mainline too, thus
the maintainers will see it then.

Thanks!

-- Steve
  


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH RT v2] x86/mce: Defer mce wakeups to threads for PREEMPT_RT
  2013-04-12 13:38 ` Borislav Petkov
@ 2013-04-12 13:44   ` Steven Rostedt
  0 siblings, 0 replies; 14+ messages in thread
From: Steven Rostedt @ 2013-04-12 13:44 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: LKML, RT, Thomas Gleixner, Clark Williams, John Kacur, Tony Luck,
	Mauro Carvalho Chehab, Ingo Molnar, H. Peter Anvin

On Fri, 2013-04-12 at 15:38 +0200, Borislav Petkov wrote:

> > +static void __mce_notify_work(void)
> > +{
> > +	/* Not more than two messages every minute */
> > +	static DEFINE_RATELIMIT_STATE(ratelimit, 60*HZ, 2);
> > +
> > +	/* wake processes polling /dev/mcelog */
> > +	wake_up_interruptible(&mce_chrdev_wait);
> > +
> > +	/*
> > +	 * There is no risk of missing notifications because
> > +	 * work_pending is always cleared before the function is
> > +	 * executed.
> > +	 */
> 
> You must be using some tree != mainline because this comment is gone
> upstream and that whole second hunk below which moves it here doesn't
> apply.
> 

I forgot to state that I based this off of 3.6-rt, and will be
backporting it to 3.0-rt as that's the kernel that the bug was reported
on. But the problem still exists in mainline as well (for -rt).

-- Steve

^ permalink raw reply	[flat|nested] 14+ messages in thread

* RE: [PATCH RT v2] x86/mce: Defer mce wakeups to threads for PREEMPT_RT
  2013-04-12 13:42   ` Steven Rostedt
@ 2013-04-12 21:44     ` Luck, Tony
  0 siblings, 0 replies; 14+ messages in thread
From: Luck, Tony @ 2013-04-12 21:44 UTC (permalink / raw)
  To: Steven Rostedt, Borislav Petkov
  Cc: LKML, RT, Thomas Gleixner, Clark Williams, John Kacur,
	Mauro Carvalho Chehab, Ingo Molnar, H. Peter Anvin

> I'm not (yet). But I just wanted to make sure there wasn't any little
> subtleties that I might be missing.

I don't think there are any hidden subtleties ... if there are then they are hidden from me too.

-Tony

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH RT v2] x86/mce: Defer mce wakeups to threads for PREEMPT_RT
  2013-04-11 18:33 [PATCH RT v2] x86/mce: Defer mce wakeups to threads for PREEMPT_RT Steven Rostedt
  2013-04-12 13:31 ` Borislav Petkov
  2013-04-12 13:38 ` Borislav Petkov
@ 2013-04-25 16:44 ` Sebastian Andrzej Siewior
  2013-04-25 17:09   ` Steven Rostedt
  2013-04-26  8:24 ` Sebastian Andrzej Siewior
  2013-04-26  8:41 ` Sebastian Andrzej Siewior
  4 siblings, 1 reply; 14+ messages in thread
From: Sebastian Andrzej Siewior @ 2013-04-25 16:44 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: LKML, RT, Thomas Gleixner, Clark Williams, John Kacur, Tony Luck,
	Borislav Petkov, Mauro Carvalho Chehab, Ingo Molnar,
	H. Peter Anvin

* Steven Rostedt | 2013-04-11 14:33:34 [-0400]:

>Comments?

So you don't mind if I take this for v3.8?
I fixed

|arch/x86/kernel/cpu/mcheck/mce.c:1392:13: warning: function declaration isn’t a prototype [-Wstrict-prototypes]

>+static void mce_notify_work()
                             ^^

Sebastian
--
To unsubscribe from this list: send the line "unsubscribe linux-rt-users" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH RT v2] x86/mce: Defer mce wakeups to threads for PREEMPT_RT
  2013-04-25 16:44 ` Sebastian Andrzej Siewior
@ 2013-04-25 17:09   ` Steven Rostedt
  2013-04-26  8:22     ` Sebastian Andrzej Siewior
  0 siblings, 1 reply; 14+ messages in thread
From: Steven Rostedt @ 2013-04-25 17:09 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: LKML, RT, Thomas Gleixner, Clark Williams, John Kacur, Tony Luck,
	Borislav Petkov, Mauro Carvalho Chehab, Ingo Molnar,
	H. Peter Anvin

On Thu, 2013-04-25 at 18:44 +0200, Sebastian Andrzej Siewior wrote:
> * Steven Rostedt | 2013-04-11 14:33:34 [-0400]:
> 
> >Comments?
> 
> So you don't mind if I take this for v3.8?
> I fixed
> 
> |arch/x86/kernel/cpu/mcheck/mce.c:1392:13: warning: function declaration isn’t a prototype [-Wstrict-prototypes]
> 
> >+static void mce_notify_work()

Thanks, I didn't look hard at the warnings.

-- Steve


--
To unsubscribe from this list: send the line "unsubscribe linux-rt-users" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH RT v2] x86/mce: Defer mce wakeups to threads for PREEMPT_RT
  2013-04-25 17:09   ` Steven Rostedt
@ 2013-04-26  8:22     ` Sebastian Andrzej Siewior
  0 siblings, 0 replies; 14+ messages in thread
From: Sebastian Andrzej Siewior @ 2013-04-26  8:22 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: LKML, RT, Thomas Gleixner, Clark Williams, John Kacur, Tony Luck,
	Borislav Petkov, Mauro Carvalho Chehab, Ingo Molnar,
	H. Peter Anvin

* Steven Rostedt | 2013-04-25 13:09:37 [-0400]:

>Thanks, I didn't look hard at the warnings.

Now that I booted the kernel I see this

|INFO: task mce-notify:78 blocked for more than 120 seconds.
|"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
|mce-notify      D 00000086     0    78      2 0x00000000
| f2e1bf2c 00000096 f2e1bebc 00000086 f2e1becc c1466606 f3440000 00000001
| c1689000 c106dd0a f2cdddf0 f3471f50 00000000 c1690f00 00000007 00000006
| c146662d f2cdddf0 00000282 00000001 f3449ef8 00000282 f2e1bf10 c106b67d
|Call Trace:
| [<c1466606>] ? _raw_spin_unlock_irqrestore+0x36/0x70
| [<c106dd0a>] ? try_to_wake_up+0x5a/0x260
| [<c146662d>] ? _raw_spin_unlock_irqrestore+0x5d/0x70
| [<c106b67d>] ? sub_preempt_count+0x4d/0xb0
| [<c1466606>] ? _raw_spin_unlock_irqrestore+0x36/0x70
| [<c101b620>] ? set_bank+0x50/0x50
| [<c1464d7e>] schedule+0x1e/0x50
| [<c105bb07>] kthread+0x67/0x90
| [<c1466662>] ? _raw_spin_unlock_irq+0x22/0x60
| [<c14671b7>] ret_from_kernel_thread+0x1b/0x28
| [<c105baa0>] ? __init_kthread_worker+0x80/0x80
|no locks held by mce-notify/78.

because the new thread is still TASK_UNINTERRUPTIBLE and nobody wakes it
up. So I did this:

diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
index c2d6dc7..332e133 100644
--- a/arch/x86/kernel/cpu/mcheck/mce.c
+++ b/arch/x86/kernel/cpu/mcheck/mce.c
@@ -1371,17 +1371,19 @@ struct task_struct *mce_notify_helper;
 
 static int mce_notify_helper_thread(void *unused)
 {
-	while (!kthread_should_stop()) {
-		__mce_notify_work();
+	while (1) {
 		set_current_state(TASK_INTERRUPTIBLE);
 		schedule();
+		if (kthread_should_stop())
+			break;
+		__mce_notify_work();
 	}
 	return 0;
 }
 
 static int mce_notify_work_init(void)
 {
-	mce_notify_helper = kthread_create(mce_notify_helper_thread, NULL,
+	mce_notify_helper = kthread_run(mce_notify_helper_thread, NULL,
 					   "mce-notify");
 	if (!mce_notify_helper)
 		return -ENOMEM;

>
>-- Steve

Sebastian

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH RT v2] x86/mce: Defer mce wakeups to threads for PREEMPT_RT
  2013-04-11 18:33 [PATCH RT v2] x86/mce: Defer mce wakeups to threads for PREEMPT_RT Steven Rostedt
                   ` (2 preceding siblings ...)
  2013-04-25 16:44 ` Sebastian Andrzej Siewior
@ 2013-04-26  8:24 ` Sebastian Andrzej Siewior
  2013-05-02 14:26   ` Steven Rostedt
  2013-04-26  8:41 ` Sebastian Andrzej Siewior
  4 siblings, 1 reply; 14+ messages in thread
From: Sebastian Andrzej Siewior @ 2013-04-26  8:24 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: LKML, RT, Thomas Gleixner, Clark Williams, John Kacur, Tony Luck,
	Borislav Petkov, Mauro Carvalho Chehab, Ingo Molnar,
	H. Peter Anvin

* Steven Rostedt | 2013-04-11 14:33:34 [-0400]:

>diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
>index e8d8ad0..060e473 100644
>--- a/arch/x86/kernel/cpu/mcheck/mce.c
>+++ b/arch/x86/kernel/cpu/mcheck/mce.c
>@@ -1308,6 +1309,61 @@ static void mce_do_trigger(struct work_struct *work)
> 
> static DECLARE_WORK(mce_trigger_work, mce_do_trigger);
> 
>+static void __mce_notify_work(void)
>+{
>+	/* Not more than two messages every minute */
>+	static DEFINE_RATELIMIT_STATE(ratelimit, 60*HZ, 2);
>+
>+	/* wake processes polling /dev/mcelog */
>+	wake_up_interruptible(&mce_chrdev_wait);
>+
>+	/*
>+	 * There is no risk of missing notifications because
>+	 * work_pending is always cleared before the function is
>+	 * executed.
>+	 */
>+	if (mce_helper[0] && !work_pending(&mce_trigger_work))
>+		schedule_work(&mce_trigger_work);

Why is here this work_pending() check? You can't enqueue a work item
twice.

Sebastian

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH RT v2] x86/mce: Defer mce wakeups to threads for PREEMPT_RT
  2013-04-11 18:33 [PATCH RT v2] x86/mce: Defer mce wakeups to threads for PREEMPT_RT Steven Rostedt
                   ` (3 preceding siblings ...)
  2013-04-26  8:24 ` Sebastian Andrzej Siewior
@ 2013-04-26  8:41 ` Sebastian Andrzej Siewior
  2013-05-02 14:33   ` Steven Rostedt
  4 siblings, 1 reply; 14+ messages in thread
From: Sebastian Andrzej Siewior @ 2013-04-26  8:41 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: LKML, RT, Thomas Gleixner, Clark Williams, John Kacur, Tony Luck,
	Borislav Petkov, Mauro Carvalho Chehab, Ingo Molnar,
	H. Peter Anvin

* Steven Rostedt | 2013-04-11 14:33:34 [-0400]:

>As wait queue locks are notorious for long hold times, we can not
>convert them to raw_spin_locks without causing issues with -rt. But
>Thomas has created a "simple-wait" structure that uses raw spin locks
>which may have been a good fit.
>
>Unfortunately, wait queues are not the only issue, as the mce_notify_irq
>also does a schedule_work(), which grabs the workqueue spin locks that
>have the exact same issue.

mce_notify_irq() can use simple_waitqueue, no?
The other issue is that mce_report_event() is scheduling a per-cpu
workqueue (mce_schedule_work) in case of a memory fault. This has the
same issue.

Sebastian

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH RT v2] x86/mce: Defer mce wakeups to threads for PREEMPT_RT
  2013-04-26  8:24 ` Sebastian Andrzej Siewior
@ 2013-05-02 14:26   ` Steven Rostedt
  0 siblings, 0 replies; 14+ messages in thread
From: Steven Rostedt @ 2013-05-02 14:26 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: LKML, RT, Thomas Gleixner, Clark Williams, John Kacur, Tony Luck,
	Borislav Petkov, Mauro Carvalho Chehab, Ingo Molnar,
	H. Peter Anvin

Grumble, somehow these emails got lost in the crowd.

On Fri, 2013-04-26 at 10:24 +0200, Sebastian Andrzej Siewior wrote:
> * Steven Rostedt | 2013-04-11 14:33:34 [-0400]:
> 
> >diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
> >index e8d8ad0..060e473 100644
> >--- a/arch/x86/kernel/cpu/mcheck/mce.c
> >+++ b/arch/x86/kernel/cpu/mcheck/mce.c
> >@@ -1308,6 +1309,61 @@ static void mce_do_trigger(struct work_struct *work)
> > 
> > static DECLARE_WORK(mce_trigger_work, mce_do_trigger);
> > 
> >+static void __mce_notify_work(void)
> >+{
> >+	/* Not more than two messages every minute */
> >+	static DEFINE_RATELIMIT_STATE(ratelimit, 60*HZ, 2);
> >+
> >+	/* wake processes polling /dev/mcelog */
> >+	wake_up_interruptible(&mce_chrdev_wait);
> >+
> >+	/*
> >+	 * There is no risk of missing notifications because
> >+	 * work_pending is always cleared before the function is
> >+	 * executed.
> >+	 */
> >+	if (mce_helper[0] && !work_pending(&mce_trigger_work))
> >+		schedule_work(&mce_trigger_work);
> 
> Why is here this work_pending() check? You can't enqueue a work item
> twice.

Yep, that doesn't look needed. Looking at the current code we have this
commit:

commit 4d899be584d4b4c5d6b49d655176b25cebf6ff1a
Author: Tejun Heo <tj@kernel.org>
Date:   Fri Dec 21 17:57:05 2012 -0800

    x86/mce: don't use [delayed_]work_pending()
    
    There's no need to test whether a (delayed) work item in pending
    before queueing, flushing or cancelling it.  Most uses are unnecessary
    and quite a few of them are buggy.
    
    Remove unnecessary pending tests from x86/mce.  Only compile tested.
    
    v2: Local var work removed from mce_schedule_work() as suggested by
        Borislav.


-- Steve

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH RT v2] x86/mce: Defer mce wakeups to threads for PREEMPT_RT
  2013-04-26  8:41 ` Sebastian Andrzej Siewior
@ 2013-05-02 14:33   ` Steven Rostedt
  2013-05-03  9:31     ` Sebastian Andrzej Siewior
  0 siblings, 1 reply; 14+ messages in thread
From: Steven Rostedt @ 2013-05-02 14:33 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: LKML, RT, Thomas Gleixner, Clark Williams, John Kacur, Tony Luck,
	Borislav Petkov, Mauro Carvalho Chehab, Ingo Molnar,
	H. Peter Anvin

On Fri, 2013-04-26 at 10:41 +0200, Sebastian Andrzej Siewior wrote:
> * Steven Rostedt | 2013-04-11 14:33:34 [-0400]:
> 
> >As wait queue locks are notorious for long hold times, we can not
> >convert them to raw_spin_locks without causing issues with -rt. But
> >Thomas has created a "simple-wait" structure that uses raw spin locks
> >which may have been a good fit.
> >
> >Unfortunately, wait queues are not the only issue, as the mce_notify_irq
> >also does a schedule_work(), which grabs the workqueue spin locks that
> >have the exact same issue.
> 
> mce_notify_irq() can use simple_waitqueue, no?

Yeah, and I went down that path.

But it also schedules work, which has the issue.

> The other issue is that mce_report_event() is scheduling a per-cpu
> workqueue (mce_schedule_work) in case of a memory fault. This has the
> same issue.

Yeah, that looks like it can be an issue too. I wonder if we can use the
same thread and use flags check what to do. Atomically set the flag for
the function to perform, and then have the thread clear it before doing
the function and only go to sleep when all flags are cleared.

-- Steve

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH RT v2] x86/mce: Defer mce wakeups to threads for PREEMPT_RT
  2013-05-02 14:33   ` Steven Rostedt
@ 2013-05-03  9:31     ` Sebastian Andrzej Siewior
  0 siblings, 0 replies; 14+ messages in thread
From: Sebastian Andrzej Siewior @ 2013-05-03  9:31 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: LKML, RT, Thomas Gleixner, Clark Williams, John Kacur, Tony Luck,
	Borislav Petkov, Mauro Carvalho Chehab, Ingo Molnar,
	H. Peter Anvin

On 05/02/2013 04:33 PM, Steven Rostedt wrote:
>> mce_notify_irq() can use simple_waitqueue, no?
> 
> Yeah, and I went down that path.
> 
> But it also schedules work, which has the issue.

Hmm, okay.

>> The other issue is that mce_report_event() is scheduling a per-cpu
>> workqueue (mce_schedule_work) in case of a memory fault. This has the
>> same issue.
> 
> Yeah, that looks like it can be an issue too. I wonder if we can use the
> same thread and use flags check what to do. Atomically set the flag for
> the function to perform, and then have the thread clear it before doing
> the function and only go to sleep when all flags are cleared.

This should work. It uses per-cpu workthreads, not sure why. Maybe to
avoid locking issues when invoked from NMI.

> 
> -- Steve

Sebastian

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2013-05-03  9:31 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-04-11 18:33 [PATCH RT v2] x86/mce: Defer mce wakeups to threads for PREEMPT_RT Steven Rostedt
2013-04-12 13:31 ` Borislav Petkov
2013-04-12 13:42   ` Steven Rostedt
2013-04-12 21:44     ` Luck, Tony
2013-04-12 13:38 ` Borislav Petkov
2013-04-12 13:44   ` Steven Rostedt
2013-04-25 16:44 ` Sebastian Andrzej Siewior
2013-04-25 17:09   ` Steven Rostedt
2013-04-26  8:22     ` Sebastian Andrzej Siewior
2013-04-26  8:24 ` Sebastian Andrzej Siewior
2013-05-02 14:26   ` Steven Rostedt
2013-04-26  8:41 ` Sebastian Andrzej Siewior
2013-05-02 14:33   ` Steven Rostedt
2013-05-03  9:31     ` Sebastian Andrzej Siewior

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).