cpufreq Archive on lore.kernel.org
 help / color / mirror / Atom feed
* Re: Linux v2.6.18-rc3
       [not found] ` <4807377b0607302113i4e984ff6j1ebae5562148907c@mail.gmail.com>
@ 2006-07-31  4:27   ` Andrew Morton
  2006-07-31 14:54     ` Alan Stern
  0 siblings, 1 reply; 18+ messages in thread
From: Andrew Morton @ 2006-07-31  4:27 UTC (permalink / raw)
  To: Jesse Brandeburg; +Cc: linux-kernel, torvalds, Alan Stern, cpufreq

On Sun, 30 Jul 2006 21:13:48 -0700
"Jesse Brandeburg" <jesse.brandeburg@gmail.com> wrote:

> On 7/29/06, Linus Torvalds <torvalds@osdl.org> wrote:
> >
> > Ok, this missed a week (it should really have been -rc4, and we should
> > have had a -rc3 a week ago), but the fact is, with a lot of people at the
> > kernel summit and at OLS, it was so quiet for a week that there simply was
> > no point.
> 
> not sure if this is a regression or not, get this on my IBM thinkpad
> T43 when resuming from S3 or from hibernate to disk.
> 
> acpi acpi: suspend
> PM: Entering mem sleep
> Intel machine check architecture supported.
> Intel machine check reporting enabled on CPU#0.
> Back to C!
> BUG: sleeping function called from invalid context at kernel/rwsem.c:20
> in_atomic():0, irqs_disabled():1
>  [<c012d638>] down_read+0x12/0x1f
>  [<c012605b>] blocking_notifier_call_chain+0xe/0x29
>  [<c029199a>] cpufreq_resume+0x118/0x13f
>  [<c0231b68>] __sysdev_resume+0x20/0x53
>  [<c0231ca9>] sysdev_resume+0x16/0x47
>  [<c0235f93>] device_power_up+0x5/0xa
>  [<c013358d>] suspend_enter+0x3b/0x44
>  [<c011b644>] printk+0x1b/0x1f
>  [<c01336fe>] enter_state+0x168/0x198
>  [<c01337b3>] state_store+0x85/0x99
>  [<c013372e>] state_store+0x0/0x99
>  [<c019047a>] subsys_attr_store+0x1e/0x22
>  [<c01906ca>] sysfs_write_file+0xa6/0xcc
>  [<c0190624>] sysfs_write_file+0x0/0xcc
>  [<c015ae52>] vfs_write+0xa8/0x159
>  [<c015b398>] sys_write+0x41/0x67
>  [<c0102bc9>] sysenter_past_esp+0x56/0x79
> PM: Finishing wakeup.
> acpi acpi: resuming
> 
> full dmesg and .config attached, I can test patches.

I think this is the cpufreq problem wherein it sometimes requires that the
notifier chain be traversed from atomic context and at other times it
requires that sleeping functions be callable from within the traversal. 
IOW: we're screwed whatever type of locking we use on that chain.

I think Alan is cooking up a scheme wherein we fix this with an srcu-locked
notifier chain.  If so, it'd be nice to get that moving along a bit?

If not, I'm not sure what the fix is - perhaps create a second notifier
chain which has the same contents but uses a different locking approach?

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Linux v2.6.18-rc3
  2006-07-31  4:27   ` Linux v2.6.18-rc3 Andrew Morton
@ 2006-07-31 14:54     ` Alan Stern
  2006-07-31 15:11       ` Andrew Morton
  0 siblings, 1 reply; 18+ messages in thread
From: Alan Stern @ 2006-07-31 14:54 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Jesse Brandeburg, linux-kernel, torvalds, cpufreq

On Sun, 30 Jul 2006, Andrew Morton wrote:

> On Sun, 30 Jul 2006 21:13:48 -0700
> "Jesse Brandeburg" <jesse.brandeburg@gmail.com> wrote:
> 
> > On 7/29/06, Linus Torvalds <torvalds@osdl.org> wrote:
> > >
> > > Ok, this missed a week (it should really have been -rc4, and we should
> > > have had a -rc3 a week ago), but the fact is, with a lot of people at the
> > > kernel summit and at OLS, it was so quiet for a week that there simply was
> > > no point.
> > 
> > not sure if this is a regression or not, get this on my IBM thinkpad
> > T43 when resuming from S3 or from hibernate to disk.
> > 
> > acpi acpi: suspend
> > PM: Entering mem sleep
> > Intel machine check architecture supported.
> > Intel machine check reporting enabled on CPU#0.
> > Back to C!
> > BUG: sleeping function called from invalid context at kernel/rwsem.c:20
> > in_atomic():0, irqs_disabled():1
> >  [<c012d638>] down_read+0x12/0x1f
> >  [<c012605b>] blocking_notifier_call_chain+0xe/0x29
> >  [<c029199a>] cpufreq_resume+0x118/0x13f
> >  [<c0231b68>] __sysdev_resume+0x20/0x53
> >  [<c0231ca9>] sysdev_resume+0x16/0x47
> >  [<c0235f93>] device_power_up+0x5/0xa
> >  [<c013358d>] suspend_enter+0x3b/0x44
> >  [<c011b644>] printk+0x1b/0x1f
> >  [<c01336fe>] enter_state+0x168/0x198
> >  [<c01337b3>] state_store+0x85/0x99
> >  [<c013372e>] state_store+0x0/0x99
> >  [<c019047a>] subsys_attr_store+0x1e/0x22
> >  [<c01906ca>] sysfs_write_file+0xa6/0xcc
> >  [<c0190624>] sysfs_write_file+0x0/0xcc
> >  [<c015ae52>] vfs_write+0xa8/0x159
> >  [<c015b398>] sys_write+0x41/0x67
> >  [<c0102bc9>] sysenter_past_esp+0x56/0x79
> > PM: Finishing wakeup.
> > acpi acpi: resuming
> > 
> > full dmesg and .config attached, I can test patches.
> 
> I think this is the cpufreq problem wherein it sometimes requires that the
> notifier chain be traversed from atomic context and at other times it
> requires that sleeping functions be callable from within the traversal. 
> IOW: we're screwed whatever type of locking we use on that chain.

I have looked at that problem more closely, and my earlier understanding
wasn't quite right.  It's not that the context needs to be atomic at some
times but not others -- it should always be a process context.  The
problem is that the suspend and resume traversals are done at a time when
interrupts need to remain disabled, since cpufreq registers its drivers as
sysdevs.  (Kind of like SYSTEM_BOOTING, except that system_state isn't set
to anything special.)  Because the down_read() call that protects the
notifier chain isn't allowed when interrupts are disabled, the BUG occurs.

> I think Alan is cooking up a scheme wherein we fix this with an srcu-locked
> notifier chain.  If so, it'd be nice to get that moving along a bit?

Yes; protecting the notifier chain by SRCU instead of an rwsem will 
prevent the problem.  It's a trivial change, except for one thing: SRCU 
structures require initialization at runtime before they can be used.  
This initialization must be done before any driver tries to register on 
the cpufreq transition notifier chain.

If someone could give me a hint where a good place would be to carry out
the initialization, I'd appreciate it.  Would an initcall be appropriate?  
And if so, which sort of initcall?  core_initcall?  The only requirement 
is that alloc_percpu() must be available.

Alan Stern

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Linux v2.6.18-rc3
  2006-07-31 14:54     ` Alan Stern
@ 2006-07-31 15:11       ` Andrew Morton
  2006-07-31 15:59         ` Alan Stern
  2006-07-31 20:34         ` Alan Stern
  0 siblings, 2 replies; 18+ messages in thread
From: Andrew Morton @ 2006-07-31 15:11 UTC (permalink / raw)
  To: Alan Stern; +Cc: cpufreq, torvalds, linux-kernel, jesse.brandeburg

On Mon, 31 Jul 2006 10:54:55 -0400 (EDT)
Alan Stern <stern@rowland.harvard.edu> wrote:

> On Sun, 30 Jul 2006, Andrew Morton wrote:
> 
> > On Sun, 30 Jul 2006 21:13:48 -0700
> > "Jesse Brandeburg" <jesse.brandeburg@gmail.com> wrote:
> > 
> > > On 7/29/06, Linus Torvalds <torvalds@osdl.org> wrote:
> > > >
> > > > Ok, this missed a week (it should really have been -rc4, and we should
> > > > have had a -rc3 a week ago), but the fact is, with a lot of people at the
> > > > kernel summit and at OLS, it was so quiet for a week that there simply was
> > > > no point.
> > > 
> > > not sure if this is a regression or not, get this on my IBM thinkpad
> > > T43 when resuming from S3 or from hibernate to disk.
> > > 
> > > acpi acpi: suspend
> > > PM: Entering mem sleep
> > > Intel machine check architecture supported.
> > > Intel machine check reporting enabled on CPU#0.
> > > Back to C!
> > > BUG: sleeping function called from invalid context at kernel/rwsem.c:20
> > > in_atomic():0, irqs_disabled():1
> > >  [<c012d638>] down_read+0x12/0x1f
> > >  [<c012605b>] blocking_notifier_call_chain+0xe/0x29
> > >  [<c029199a>] cpufreq_resume+0x118/0x13f
> > >  [<c0231b68>] __sysdev_resume+0x20/0x53
> > >  [<c0231ca9>] sysdev_resume+0x16/0x47
> > >  [<c0235f93>] device_power_up+0x5/0xa
> > >  [<c013358d>] suspend_enter+0x3b/0x44
> > >  [<c011b644>] printk+0x1b/0x1f
> > >  [<c01336fe>] enter_state+0x168/0x198
> > >  [<c01337b3>] state_store+0x85/0x99
> > >  [<c013372e>] state_store+0x0/0x99
> > >  [<c019047a>] subsys_attr_store+0x1e/0x22
> > >  [<c01906ca>] sysfs_write_file+0xa6/0xcc
> > >  [<c0190624>] sysfs_write_file+0x0/0xcc
> > >  [<c015ae52>] vfs_write+0xa8/0x159
> > >  [<c015b398>] sys_write+0x41/0x67
> > >  [<c0102bc9>] sysenter_past_esp+0x56/0x79
> > > PM: Finishing wakeup.
> > > acpi acpi: resuming
> > > 
> > > full dmesg and .config attached, I can test patches.
> > 
> > I think this is the cpufreq problem wherein it sometimes requires that the
> > notifier chain be traversed from atomic context and at other times it
> > requires that sleeping functions be callable from within the traversal. 
> > IOW: we're screwed whatever type of locking we use on that chain.
> 
> I have looked at that problem more closely, and my earlier understanding
> wasn't quite right.  It's not that the context needs to be atomic at some
> times but not others -- it should always be a process context.  The
> problem is that the suspend and resume traversals are done at a time when
> interrupts need to remain disabled, since cpufreq registers its drivers as
> sysdevs.  (Kind of like SYSTEM_BOOTING, except that system_state isn't set
> to anything special.)  Because the down_read() call that protects the
> notifier chain isn't allowed when interrupts are disabled, the BUG occurs.

So why wouldn't an atomic notifier be suitable?

> > I think Alan is cooking up a scheme wherein we fix this with an srcu-locked
> > notifier chain.  If so, it'd be nice to get that moving along a bit?
> 
> Yes; protecting the notifier chain by SRCU instead of an rwsem will 
> prevent the problem.  It's a trivial change, except for one thing: SRCU 
> structures require initialization at runtime before they can be used.  
> This initialization must be done before any driver tries to register on 
> the cpufreq transition notifier chain.
> 
> If someone could give me a hint where a good place would be to carry out
> the initialization, I'd appreciate it.  Would an initcall be appropriate?  
> And if so, which sort of initcall?  core_initcall?  The only requirement 
> is that alloc_percpu() must be available.
> 

core_initcall() would suit.  That's actually a bit late for this sort of
thing, but we can always add a new section later if it becomes a problem. 
I'd suggest that we ensure that srcu_notifier_chain_register() performs a
reliable BUG() if it gets called too early.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Linux v2.6.18-rc3
  2006-07-31 15:11       ` Andrew Morton
@ 2006-07-31 15:59         ` Alan Stern
  2006-07-31 20:34         ` Alan Stern
  1 sibling, 0 replies; 18+ messages in thread
From: Alan Stern @ 2006-07-31 15:59 UTC (permalink / raw)
  To: Andrew Morton; +Cc: jesse.brandeburg, linux-kernel, torvalds, cpufreq

On Mon, 31 Jul 2006, Andrew Morton wrote:

> > > I think this is the cpufreq problem wherein it sometimes requires that the
> > > notifier chain be traversed from atomic context and at other times it
> > > requires that sleeping functions be callable from within the traversal. 
> > > IOW: we're screwed whatever type of locking we use on that chain.
> > 
> > I have looked at that problem more closely, and my earlier understanding
> > wasn't quite right.  It's not that the context needs to be atomic at some
> > times but not others -- it should always be a process context.  The
> > problem is that the suspend and resume traversals are done at a time when
> > interrupts need to remain disabled, since cpufreq registers its drivers as
> > sysdevs.  (Kind of like SYSTEM_BOOTING, except that system_state isn't set
> > to anything special.)  Because the down_read() call that protects the
> > notifier chain isn't allowed when interrupts are disabled, the BUG occurs.
> 
> So why wouldn't an atomic notifier be suitable?

I can't be entirely certain.  It looks like most of the callout routines
would work fine in an atomic context, but there are a couple of 
exceptions:

drivers/pcmcia/soc_common.c:soc_pcmcia_notifier() does a down().  Perhaps
this should be made conditional on the notifier message not being
CPUFREQ_SUSPENDCHANGE or CPUFREQ_RESUMECHANGE.  Similarly,
arch/i386/kernel/tsc.c:time_cpufreq_notifier() calls
write_sequnlock_irq().

(Not to mention the fact that drivers/serial/sh-sci.c:sci_init() registers 
a cpufreq notifier but sci_exit() neglects to unregister it.)

In general, the callout routines don't seem to treat the PRECHANGE and
POSTCHANGE messages differently from the SUSPENDCHANGE and RESUMECHANGE
messages.  So I'm reluctant to split the two sorts of messages up into two
separate chains.

> > If someone could give me a hint where a good place would be to carry out
> > the initialization, I'd appreciate it.  Would an initcall be appropriate?  
> > And if so, which sort of initcall?  core_initcall?  The only requirement 
> > is that alloc_percpu() must be available.
> > 
> 
> core_initcall() would suit.  That's actually a bit late for this sort of
> thing, but we can always add a new section later if it becomes a problem. 
> I'd suggest that we ensure that srcu_notifier_chain_register() performs a
> reliable BUG() if it gets called too early.

Okay, let me work on a patch.

Alan Stern

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Linux v2.6.18-rc3
  2006-07-31 15:11       ` Andrew Morton
  2006-07-31 15:59         ` Alan Stern
@ 2006-07-31 20:34         ` Alan Stern
  2006-08-02  4:31           ` Jesse Brandeburg
  1 sibling, 1 reply; 18+ messages in thread
From: Alan Stern @ 2006-07-31 20:34 UTC (permalink / raw)
  To: jesse.brandeburg, Andrew Morton
  Cc: Kernel development list, torvalds, cpufreq

On Mon, 31 Jul 2006, Andrew Morton wrote:

> core_initcall() would suit.  That's actually a bit late for this sort of
> thing, but we can always add a new section later if it becomes a problem. 
> I'd suggest that we ensure that srcu_notifier_chain_register() performs a
> reliable BUG() if it gets called too early.

Here's a patch to test.  I can't try it out on my machine because 
2.6.18-rc2-mm1 (even without the patch) crashes partway through a 
suspend-to-disk, in a way that's extremely hard to debug.  Some sort of 
spinlock-related bug occurs within ioapic_write_entry.

Alan Stern


Index: 2.6.18-rc2-mm1/kernel/sys.c
===================================================================
--- 2.6.18-rc2-mm1.orig/kernel/sys.c
+++ 2.6.18-rc2-mm1/kernel/sys.c
@@ -516,7 +516,7 @@ EXPORT_SYMBOL_GPL(srcu_notifier_call_cha
 void srcu_init_notifier_head(struct srcu_notifier_head *nh)
 {
 	mutex_init(&nh->mutex);
-	init_srcu_struct(&nh->srcu);
+	BUG_ON(init_srcu_struct(&nh->srcu) < 0);
 	nh->head = NULL;
 }
 
Index: 2.6.18-rc2-mm1/kernel/srcu.c
===================================================================
--- 2.6.18-rc2-mm1.orig/kernel/srcu.c
+++ 2.6.18-rc2-mm1/kernel/srcu.c
@@ -42,11 +42,12 @@
  * to any other function.  Each srcu_struct represents a separate domain
  * of SRCU protection.
  */
-void init_srcu_struct(struct srcu_struct *sp)
+int init_srcu_struct(struct srcu_struct *sp)
 {
 	sp->completed = 0;
-	sp->per_cpu_ref = alloc_percpu(struct srcu_struct_array);
 	mutex_init(&sp->mutex);
+	sp->per_cpu_ref = alloc_percpu(struct srcu_struct_array);
+	return (sp->per_cpu_ref ? 0 : -ENOMEM);
 }
 
 /*
Index: 2.6.18-rc2-mm1/include/linux/srcu.h
===================================================================
--- 2.6.18-rc2-mm1.orig/include/linux/srcu.h
+++ 2.6.18-rc2-mm1/include/linux/srcu.h
@@ -43,7 +43,7 @@ struct srcu_struct {
 #define srcu_barrier()
 #endif /* #else #ifndef CONFIG_PREEMPT */
 
-void init_srcu_struct(struct srcu_struct *sp);
+int init_srcu_struct(struct srcu_struct *sp);
 void cleanup_srcu_struct(struct srcu_struct *sp);
 int srcu_read_lock(struct srcu_struct *sp);
 void srcu_read_unlock(struct srcu_struct *sp, int idx);
Index: 2.6.18-rc2-mm1/drivers/cpufreq/cpufreq.c
===================================================================
--- 2.6.18-rc2-mm1.orig/drivers/cpufreq/cpufreq.c
+++ 2.6.18-rc2-mm1/drivers/cpufreq/cpufreq.c
@@ -52,8 +52,14 @@ static void handle_update(void *data);
  * The mutex locks both lists.
  */
 static BLOCKING_NOTIFIER_HEAD(cpufreq_policy_notifier_list);
-static BLOCKING_NOTIFIER_HEAD(cpufreq_transition_notifier_list);
+static struct srcu_notifier_head cpufreq_transition_notifier_list;
 
+static int __init init_cpufreq_transition_notifier_list(void)
+{
+	srcu_init_notifier_head(&cpufreq_transition_notifier_list);
+	return 0;
+}
+core_initcall(init_cpufreq_transition_notifier_list);
 
 static LIST_HEAD(cpufreq_governor_list);
 static DEFINE_MUTEX (cpufreq_governor_mutex);
@@ -262,14 +268,14 @@ void cpufreq_notify_transition(struct cp
 				freqs->old = policy->cur;
 			}
 		}
-		blocking_notifier_call_chain(&cpufreq_transition_notifier_list,
+		srcu_notifier_call_chain(&cpufreq_transition_notifier_list,
 				CPUFREQ_PRECHANGE, freqs);
 		adjust_jiffies(CPUFREQ_PRECHANGE, freqs);
 		break;
 
 	case CPUFREQ_POSTCHANGE:
 		adjust_jiffies(CPUFREQ_POSTCHANGE, freqs);
-		blocking_notifier_call_chain(&cpufreq_transition_notifier_list,
+		srcu_notifier_call_chain(&cpufreq_transition_notifier_list,
 				CPUFREQ_POSTCHANGE, freqs);
 		if (likely(policy) && likely(policy->cpu == freqs->cpu))
 			policy->cur = freqs->new;
@@ -1049,7 +1055,7 @@ static int cpufreq_suspend(struct sys_de
 		freqs.old = cpu_policy->cur;
 		freqs.new = cur_freq;
 
-		blocking_notifier_call_chain(&cpufreq_transition_notifier_list,
+		srcu_notifier_call_chain(&cpufreq_transition_notifier_list,
 				    CPUFREQ_SUSPENDCHANGE, &freqs);
 		adjust_jiffies(CPUFREQ_SUSPENDCHANGE, &freqs);
 
@@ -1130,7 +1136,7 @@ static int cpufreq_resume(struct sys_dev
 			freqs.old = cpu_policy->cur;
 			freqs.new = cur_freq;
 
-			blocking_notifier_call_chain(
+			srcu_notifier_call_chain(
 					&cpufreq_transition_notifier_list,
 					CPUFREQ_RESUMECHANGE, &freqs);
 			adjust_jiffies(CPUFREQ_RESUMECHANGE, &freqs);
@@ -1176,7 +1182,7 @@ int cpufreq_register_notifier(struct not
 
 	switch (list) {
 	case CPUFREQ_TRANSITION_NOTIFIER:
-		ret = blocking_notifier_chain_register(
+		ret = srcu_notifier_chain_register(
 				&cpufreq_transition_notifier_list, nb);
 		break;
 	case CPUFREQ_POLICY_NOTIFIER:
@@ -1208,7 +1214,7 @@ int cpufreq_unregister_notifier(struct n
 
 	switch (list) {
 	case CPUFREQ_TRANSITION_NOTIFIER:
-		ret = blocking_notifier_chain_unregister(
+		ret = srcu_notifier_chain_unregister(
 				&cpufreq_transition_notifier_list, nb);
 		break;
 	case CPUFREQ_POLICY_NOTIFIER:

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Linux v2.6.18-rc3
  2006-07-31 20:34         ` Alan Stern
@ 2006-08-02  4:31           ` Jesse Brandeburg
  2006-08-02  4:59             ` Andrew Morton
  0 siblings, 1 reply; 18+ messages in thread
From: Jesse Brandeburg @ 2006-08-02  4:31 UTC (permalink / raw)
  To: Alan Stern; +Cc: Andrew Morton, Kernel development list, torvalds, cpufreq

On 7/31/06, Alan Stern <stern@rowland.harvard.edu> wrote:
> On Mon, 31 Jul 2006, Andrew Morton wrote:
>
> > core_initcall() would suit.  That's actually a bit late for this sort of
> > thing, but we can always add a new section later if it becomes a problem.
> > I'd suggest that we ensure that srcu_notifier_chain_register() performs a
> > reliable BUG() if it gets called too early.
>
> Here's a patch to test.  I can't try it out on my machine because
> 2.6.18-rc2-mm1 (even without the patch) crashes partway through a
> suspend-to-disk, in a way that's extremely hard to debug.  Some sort of
> spinlock-related bug occurs within ioapic_write_entry.

can't test because I also can't suspend or hibernate with rc2-mm1
(resume causes hard hang with the backlight and screen off)  The issue
i reported was against linus' 2.6.18-rc3 kernel.

patch didn't apply to 2.6.18-rc3.

Thanks,
  Jesse

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Linux v2.6.18-rc3
  2006-08-02  4:31           ` Jesse Brandeburg
@ 2006-08-02  4:59             ` Andrew Morton
  2006-08-02 19:57               ` Jesse Brandeburg
  0 siblings, 1 reply; 18+ messages in thread
From: Andrew Morton @ 2006-08-02  4:59 UTC (permalink / raw)
  To: Jesse Brandeburg; +Cc: stern, linux-kernel, torvalds, cpufreq

On Tue, 1 Aug 2006 21:31:22 -0700
"Jesse Brandeburg" <jesse.brandeburg@gmail.com> wrote:

> On 7/31/06, Alan Stern <stern@rowland.harvard.edu> wrote:
> > On Mon, 31 Jul 2006, Andrew Morton wrote:
> >
> > > core_initcall() would suit.  That's actually a bit late for this sort of
> > > thing, but we can always add a new section later if it becomes a problem.
> > > I'd suggest that we ensure that srcu_notifier_chain_register() performs a
> > > reliable BUG() if it gets called too early.
> >
> > Here's a patch to test.  I can't try it out on my machine because
> > 2.6.18-rc2-mm1 (even without the patch) crashes partway through a
> > suspend-to-disk, in a way that's extremely hard to debug.  Some sort of
> > spinlock-related bug occurs within ioapic_write_entry.
> 
> can't test because I also can't suspend or hibernate with rc2-mm1
> (resume causes hard hang with the backlight and screen off)  The issue
> i reported was against linus' 2.6.18-rc3 kernel.
> 

This might help?


author Jiri Slaby <ku@bellona.localdomain> Tue, 01 Aug 2006 01:16:13 +0159

--- a/arch/i386/kernel/io_apic.c
+++ b/arch/i386/kernel/io_apic.c
@@ -2360,6 +2360,7 @@ static int ioapic_resume(struct sys_devi
 		reg_00.bits.ID = mp_ioapics[dev->id].mpc_apicid;
 		io_apic_write(dev->id, 0, reg_00.raw);
 	}
+	spin_unlock_irqrestore(&ioapic_lock, flags);
 	for (i = 0; i < nr_ioapic_registers[dev->id]; i ++)
 		ioapic_write_entry(dev->id, i, entry[i]);
 
-

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Linux v2.6.18-rc3
  2006-08-02  4:59             ` Andrew Morton
@ 2006-08-02 19:57               ` Jesse Brandeburg
  2006-08-02 20:16                 ` Rafael J. Wysocki
  0 siblings, 1 reply; 18+ messages in thread
From: Jesse Brandeburg @ 2006-08-02 19:57 UTC (permalink / raw)
  To: Andrew Morton; +Cc: stern, linux-kernel, torvalds, cpufreq

On 8/1/06, Andrew Morton <akpm@osdl.org> wrote:
> On Tue, 1 Aug 2006 21:31:22 -0700
> "Jesse Brandeburg" <jesse.brandeburg@gmail.com> wrote:
>
> > On 7/31/06, Alan Stern <stern@rowland.harvard.edu> wrote:
> > > On Mon, 31 Jul 2006, Andrew Morton wrote:
> > >
> > > > core_initcall() would suit.  That's actually a bit late for this sort of
> > > > thing, but we can always add a new section later if it becomes a problem.
> > > > I'd suggest that we ensure that srcu_notifier_chain_register() performs a
> > > > reliable BUG() if it gets called too early.
> > >
> > > Here's a patch to test.  I can't try it out on my machine because
> > > 2.6.18-rc2-mm1 (even without the patch) crashes partway through a
> > > suspend-to-disk, in a way that's extremely hard to debug.  Some sort of
> > > spinlock-related bug occurs within ioapic_write_entry.
> >
> > can't test because I also can't suspend or hibernate with rc2-mm1
> > (resume causes hard hang with the backlight and screen off)  The issue
> > i reported was against linus' 2.6.18-rc3 kernel.
> >
>
> This might help?
>
>
> author Jiri Slaby <ku@bellona.localdomain> Tue, 01 Aug 2006 01:16:13 +0159
>
> --- a/arch/i386/kernel/io_apic.c
> +++ b/arch/i386/kernel/io_apic.c
> @@ -2360,6 +2360,7 @@ static int ioapic_resume(struct sys_devi
>                 reg_00.bits.ID = mp_ioapics[dev->id].mpc_apicid;
>                 io_apic_write(dev->id, 0, reg_00.raw);
>         }
> +       spin_unlock_irqrestore(&ioapic_lock, flags);
>         for (i = 0; i < nr_ioapic_registers[dev->id]; i ++)
>                 ioapic_write_entry(dev->id, i, entry[i]);
>
> -
>
>

after applying this patch from jiri as well as the patch from alan, I
can now suspend and resume, and the patch from alan seems to work too,
but I have no idea if it executed.

BTW, I get junk out the serial port with the first bits of printk (and
during resume from S3 too) but then something manages to init the
serial port to the right speed and text starts coming out correctly.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Linux v2.6.18-rc3
  2006-08-02 19:57               ` Jesse Brandeburg
@ 2006-08-02 20:16                 ` Rafael J. Wysocki
  2006-08-02 20:23                   ` Russell King
  0 siblings, 1 reply; 18+ messages in thread
From: Rafael J. Wysocki @ 2006-08-02 20:16 UTC (permalink / raw)
  To: Jesse Brandeburg; +Cc: Andrew Morton, stern, linux-kernel, torvalds, cpufreq

On Wednesday 02 August 2006 21:57, Jesse Brandeburg wrote:
> On 8/1/06, Andrew Morton <akpm@osdl.org> wrote:
> > On Tue, 1 Aug 2006 21:31:22 -0700
> > "Jesse Brandeburg" <jesse.brandeburg@gmail.com> wrote:
> >
> > > On 7/31/06, Alan Stern <stern@rowland.harvard.edu> wrote:
> > > > On Mon, 31 Jul 2006, Andrew Morton wrote:
> > > >
> > > > > core_initcall() would suit.  That's actually a bit late for this sort of
> > > > > thing, but we can always add a new section later if it becomes a problem.
> > > > > I'd suggest that we ensure that srcu_notifier_chain_register() performs a
> > > > > reliable BUG() if it gets called too early.
> > > >
> > > > Here's a patch to test.  I can't try it out on my machine because
> > > > 2.6.18-rc2-mm1 (even without the patch) crashes partway through a
> > > > suspend-to-disk, in a way that's extremely hard to debug.  Some sort of
> > > > spinlock-related bug occurs within ioapic_write_entry.
> > >
> > > can't test because I also can't suspend or hibernate with rc2-mm1
> > > (resume causes hard hang with the backlight and screen off)  The issue
> > > i reported was against linus' 2.6.18-rc3 kernel.
> > >
> >
> > This might help?
> >
> >
> > author Jiri Slaby <ku@bellona.localdomain> Tue, 01 Aug 2006 01:16:13 +0159
> >
> > --- a/arch/i386/kernel/io_apic.c
> > +++ b/arch/i386/kernel/io_apic.c
> > @@ -2360,6 +2360,7 @@ static int ioapic_resume(struct sys_devi
> >                 reg_00.bits.ID = mp_ioapics[dev->id].mpc_apicid;
> >                 io_apic_write(dev->id, 0, reg_00.raw);
> >         }
> > +       spin_unlock_irqrestore(&ioapic_lock, flags);
> >         for (i = 0; i < nr_ioapic_registers[dev->id]; i ++)
> >                 ioapic_write_entry(dev->id, i, entry[i]);
> >
> > -
> >
> >
> 
> after applying this patch from jiri as well as the patch from alan, I
> can now suspend and resume, and the patch from alan seems to work too,
> but I have no idea if it executed.
> 
> BTW, I get junk out the serial port with the first bits of printk (and
> during resume from S3 too) but then something manages to init the
> serial port to the right speed and text starts coming out correctly.

Please try the following patch from Russell King and see if it helps.

 drivers/char/tty_io.c        |   13 +++++++++++++
 drivers/serial/serial_core.c |   12 ++++++------
 include/linux/tty.h          |    2 ++
 3 files changed, 21 insertions(+), 6 deletions(-)

Index: linux-2.6.18-rc1-mm2/drivers/char/tty_io.c
===================================================================
--- linux-2.6.18-rc1-mm2.orig/drivers/char/tty_io.c
+++ linux-2.6.18-rc1-mm2/drivers/char/tty_io.c
@@ -1663,6 +1663,19 @@ release_mem_out:
 }
 
 /*
+ * Get a copy of the termios structure for the driver/index
+ */
+void tty_get_termios(struct tty_driver *driver, int idx, struct termios *tio)
+{
+	lock_kernel();
+	if (driver->termios[idx])
+		*tio = *driver->termios[idx];
+	else
+		*tio = driver->init_termios;
+	unlock_kernel();
+}
+
+/*
  * Releases memory associated with a tty structure, and clears out the
  * driver table slots.
  */
Index: linux-2.6.18-rc1-mm2/drivers/serial/serial_core.c
===================================================================
--- linux-2.6.18-rc1-mm2.orig/drivers/serial/serial_core.c
+++ linux-2.6.18-rc1-mm2/drivers/serial/serial_core.c
@@ -1981,16 +1981,16 @@ int uart_resume_port(struct uart_driver 
 		struct termios termios;
 
 		/*
-		 * First try to use the console cflag setting.
+		 * Get the termios for this line
 		 */
-		memset(&termios, 0, sizeof(struct termios));
-		termios.c_cflag = port->cons->cflag;
+		tty_get_termios(drv->tty_driver, port->line, &termios);
 
 		/*
-		 * If that's unset, use the tty termios setting.
+		 * If the console cflag is still set, subsitute that
+		 * for the termios cflag.
 		 */
-		if (state->info && state->info->tty && termios.c_cflag == 0)
-			termios = *state->info->tty->termios;
+		if (port->cons->cflag)
+			termios.c_cflag = port->cons->cflag;
 
 		port->ops->set_termios(port, &termios, NULL);
 		console_start(port->cons);
Index: linux-2.6.18-rc1-mm2/include/linux/tty.h
===================================================================
--- linux-2.6.18-rc1-mm2.orig/include/linux/tty.h
+++ linux-2.6.18-rc1-mm2/include/linux/tty.h
@@ -284,6 +284,8 @@ extern int tty_read_raw_data(struct tty_
 			     int buflen);
 extern void tty_write_message(struct tty_struct *tty, char *msg);
 
+extern void tty_get_termios(struct tty_driver *drv, int idx, struct termios *tio);
+
 extern int is_orphaned_pgrp(int pgrp);
 extern int is_ignored(int sig);
 extern int tty_signal(int sig, struct tty_struct *tty);

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Linux v2.6.18-rc3
  2006-08-02 20:16                 ` Rafael J. Wysocki
@ 2006-08-02 20:23                   ` Russell King
  2006-08-02 20:26                     ` Rafael J. Wysocki
  2006-08-02 20:32                     ` Dave Jones
  0 siblings, 2 replies; 18+ messages in thread
From: Russell King @ 2006-08-02 20:23 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Andrew Morton, linux-kernel, Jesse Brandeburg, torvalds, stern,
	cpufreq

On Wed, Aug 02, 2006 at 10:16:54PM +0200, Rafael J. Wysocki wrote:
> Please try the following patch from Russell King and see if it helps.

I'd have missed this if it weren't for that comment.  It hasn't been
merged so far due to the lack of feedback on it...  Thanks for trying
to get that feedback.

-- 
Russell King
 Linux kernel    2.6 ARM Linux   - http://www.arm.linux.org.uk/
 maintainer of:  2.6 Serial core

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Linux v2.6.18-rc3
  2006-08-02 20:23                   ` Russell King
@ 2006-08-02 20:26                     ` Rafael J. Wysocki
  2006-08-02 20:32                     ` Dave Jones
  1 sibling, 0 replies; 18+ messages in thread
From: Rafael J. Wysocki @ 2006-08-02 20:26 UTC (permalink / raw)
  To: Russell King
  Cc: Jesse Brandeburg, Andrew Morton, stern, linux-kernel, torvalds,
	cpufreq

On Wednesday 02 August 2006 22:23, Russell King wrote:
> On Wed, Aug 02, 2006 at 10:16:54PM +0200, Rafael J. Wysocki wrote:
> > Please try the following patch from Russell King and see if it helps.
> 
> I'd have missed this if it weren't for that comment.  It hasn't been
> merged so far due to the lack of feedback on it...  Thanks for trying
> to get that feedback.

Well, it fixes the problem for me. ;-)

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Linux v2.6.18-rc3
  2006-08-02 20:23                   ` Russell King
  2006-08-02 20:26                     ` Rafael J. Wysocki
@ 2006-08-02 20:32                     ` Dave Jones
  2006-08-02 20:58                       ` Russell King
  1 sibling, 1 reply; 18+ messages in thread
From: Dave Jones @ 2006-08-02 20:32 UTC (permalink / raw)
  To: Rafael J. Wysocki, Jesse Brandeburg, Andrew Morton, stern,
	linux-kernel, torvalds, cpufreq

On Wed, Aug 02, 2006 at 09:23:09PM +0100, Russell King wrote:
 > On Wed, Aug 02, 2006 at 10:16:54PM +0200, Rafael J. Wysocki wrote:
 > > Please try the following patch from Russell King and see if it helps.
 > 
 > I'd have missed this if it weren't for that comment.  It hasn't been
 > merged so far due to the lack of feedback on it...  Thanks for trying
 > to get that feedback.

I'm fairly certain I gave feedback on this, and I'm sure I recall Pavel 
did too.  It did nothing for me.

		Dave

-- 
http://www.codemonkey.org.uk

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Linux v2.6.18-rc3
  2006-08-02 20:32                     ` Dave Jones
@ 2006-08-02 20:58                       ` Russell King
  2006-08-02 21:01                         ` Dave Jones
  2006-08-02 21:18                         ` Linus Torvalds
  0 siblings, 2 replies; 18+ messages in thread
From: Russell King @ 2006-08-02 20:58 UTC (permalink / raw)
  To: Dave Jones, Rafael J. Wysocki, Jesse Brandeburg, Andrew Morton,
	stern, linux-kernel, torvalds, cpufreq

On Wed, Aug 02, 2006 at 04:32:36PM -0400, Dave Jones wrote:
> On Wed, Aug 02, 2006 at 09:23:09PM +0100, Russell King wrote:
>  > On Wed, Aug 02, 2006 at 10:16:54PM +0200, Rafael J. Wysocki wrote:
>  > > Please try the following patch from Russell King and see if it helps.
>  > 
>  > I'd have missed this if it weren't for that comment.  It hasn't been
>  > merged so far due to the lack of feedback on it...  Thanks for trying
>  > to get that feedback.
> 
> I'm fairly certain I gave feedback on this, and I'm sure I recall Pavel 
> did too.  It did nothing for me.

Maybe you did, but it never got here.  The last two messages in the
thread were mine posting this patch and pleading for some feedback
which never came.

Rafael has reported that it fixes his problem, which is great - and is
the first bit of feedback I've received on it (thanks Rafael.)

I've no idea why it doesn't work for you though.

-- 
Russell King
 Linux kernel    2.6 ARM Linux   - http://www.arm.linux.org.uk/
 maintainer of:  2.6 Serial core

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Linux v2.6.18-rc3
  2006-08-02 20:58                       ` Russell King
@ 2006-08-02 21:01                         ` Dave Jones
  2006-08-02 21:18                         ` Linus Torvalds
  1 sibling, 0 replies; 18+ messages in thread
From: Dave Jones @ 2006-08-02 21:01 UTC (permalink / raw)
  To: Rafael J. Wysocki, Jesse Brandeburg, Andrew Morton, stern,
	linux-kernel, torvalds, cpufreq

On Wed, Aug 02, 2006 at 09:58:24PM +0100, Russell King wrote:

 > Maybe you did, but it never got here.  The last two messages in the
 > thread were mine posting this patch and pleading for some feedback
 > which never came.
 > 
 > Rafael has reported that it fixes his problem, which is great - and is
 > the first bit of feedback I've received on it (thanks Rafael.)
 > 
 > I've no idea why it doesn't work for you though.

doesn't make it any worse at least ;-)

		Dave
-- 
http://www.codemonkey.org.uk

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Linux v2.6.18-rc3
  2006-08-02 20:58                       ` Russell King
  2006-08-02 21:01                         ` Dave Jones
@ 2006-08-02 21:18                         ` Linus Torvalds
  2006-08-02 21:38                           ` Russell King
  1 sibling, 1 reply; 18+ messages in thread
From: Linus Torvalds @ 2006-08-02 21:18 UTC (permalink / raw)
  To: Russell King
  Cc: Dave Jones, Rafael J. Wysocki, Jesse Brandeburg, Andrew Morton,
	stern, linux-kernel, cpufreq



On Wed, 2 Aug 2006, Russell King wrote:
> 
> Rafael has reported that it fixes his problem, which is great - and is
> the first bit of feedback I've received on it (thanks Rafael.)
> 
> I've no idea why it doesn't work for you though.

Well, more importantly, why would we do something like this in the first 
place?

Wouldn't it be a _lot_ better to just use the bog-standard 
"suspend/resume" callbacks, and let serial drivers just suspend/resume on 
their own, instead of having upper layers generate these fake 
"set_termios()" calls?

The serial layer should use set_termios() when users set the termios state 
(surprise surprise), not to emulate suspend/restore. 

Real hardware tends to want to do a lot more _anyway_ for suspend/restore, 
so the argument that "set_termios()" already exists as an interface is 
pretty bogus.

			Linus

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Linux v2.6.18-rc3
  2006-08-02 21:18                         ` Linus Torvalds
@ 2006-08-02 21:38                           ` Russell King
  2006-08-02 22:04                             ` Linus Torvalds
  2006-08-02 22:05                             ` Russell King
  0 siblings, 2 replies; 18+ messages in thread
From: Russell King @ 2006-08-02 21:38 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Dave Jones, Rafael J. Wysocki, Jesse Brandeburg, Andrew Morton,
	stern, linux-kernel, cpufreq

On Wed, Aug 02, 2006 at 02:18:55PM -0700, Linus Torvalds wrote:
> Well, more importantly, why would we do something like this in the first 
> place?

The low level drivers can do that already if they so wish.  We provide
a library function to allow them to do the generic parts, which is
what we're talking about here.

> The serial layer should use set_termios() when users set the termios state 
> (surprise surprise), not to emulate suspend/restore.

Yes Linus, you're obviously right.  Would you mind re-engineering this
while I'm away for the next few days.  For _ALL_ serial drivers, not
just 8250.  Thanks.

-- 
Russell King
 Linux kernel    2.6 ARM Linux   - http://www.arm.linux.org.uk/
 maintainer of:  2.6 Serial core

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Linux v2.6.18-rc3
  2006-08-02 21:38                           ` Russell King
@ 2006-08-02 22:04                             ` Linus Torvalds
  2006-08-02 22:05                             ` Russell King
  1 sibling, 0 replies; 18+ messages in thread
From: Linus Torvalds @ 2006-08-02 22:04 UTC (permalink / raw)
  To: Russell King
  Cc: Dave Jones, Rafael J. Wysocki, Jesse Brandeburg, Andrew Morton,
	stern, linux-kernel, cpufreq



On Wed, 2 Aug 2006, Russell King wrote:
> 
> > The serial layer should use set_termios() when users set the termios state 
> > (surprise surprise), not to emulate suspend/restore.
> 
> Yes Linus, you're obviously right.  Would you mind re-engineering this
> while I'm away for the next few days.  For _ALL_ serial drivers, not
> just 8250.  Thanks.

The problem is that right now, the silly set_termios() call can be 
actively detrimental to sub-drivers that do this right. 

I suspect it would be a lot better to just fix a few major serial drivers 
(and yes, that means primarily 8250), and just force others to fix 
themselves as developers get around to it and care (in many cases they 
might not even do so). As it is, going to set_termios way is likely to 
just make things _worse_ in the long run, by just not letting the serial 
driver do what's sane.

		Linus

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Linux v2.6.18-rc3
  2006-08-02 21:38                           ` Russell King
  2006-08-02 22:04                             ` Linus Torvalds
@ 2006-08-02 22:05                             ` Russell King
  1 sibling, 0 replies; 18+ messages in thread
From: Russell King @ 2006-08-02 22:05 UTC (permalink / raw)
  To: Linus Torvalds, Dave Jones, Rafael J. Wysocki, Jesse Brandeburg,
	Andrew Morton, stern, linux-kernel, cpufreq

On Wed, Aug 02, 2006 at 10:38:34PM +0100, Russell King wrote:
> On Wed, Aug 02, 2006 at 02:18:55PM -0700, Linus Torvalds wrote:
> > The serial layer should use set_termios() when users set the termios state 
> > (surprise surprise), not to emulate suspend/restore.
> 
> Yes Linus, you're obviously right.  Would you mind re-engineering this
> while I'm away for the next few days.  For _ALL_ serial drivers, not
> just 8250.  Thanks.

BTW, you'll need to replicate some of the quirk behaviour on some 8250
compatible UARTs when restoring registers - you can't blindly write
the registers in any one particular order.

Some UARTs have side effects which reset some features if you do that
(eg, TI16750 UARTs with 64 byte FIFOs need a different register writing
order from everything else, otherwise it turns off the 64-byte FIFOs.)
Others require you to write a sequence of values to a register, etc.

Okay, consider me gone from this point forwards for the next few days.

-- 
Russell King
 Linux kernel    2.6 ARM Linux   - http://www.arm.linux.org.uk/
 maintainer of:  2.6 Serial core

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2006-08-02 22:05 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <Pine.LNX.4.64.0607292320490.4168@g5.osdl.org>
     [not found] ` <4807377b0607302113i4e984ff6j1ebae5562148907c@mail.gmail.com>
2006-07-31  4:27   ` Linux v2.6.18-rc3 Andrew Morton
2006-07-31 14:54     ` Alan Stern
2006-07-31 15:11       ` Andrew Morton
2006-07-31 15:59         ` Alan Stern
2006-07-31 20:34         ` Alan Stern
2006-08-02  4:31           ` Jesse Brandeburg
2006-08-02  4:59             ` Andrew Morton
2006-08-02 19:57               ` Jesse Brandeburg
2006-08-02 20:16                 ` Rafael J. Wysocki
2006-08-02 20:23                   ` Russell King
2006-08-02 20:26                     ` Rafael J. Wysocki
2006-08-02 20:32                     ` Dave Jones
2006-08-02 20:58                       ` Russell King
2006-08-02 21:01                         ` Dave Jones
2006-08-02 21:18                         ` Linus Torvalds
2006-08-02 21:38                           ` Russell King
2006-08-02 22:04                             ` Linus Torvalds
2006-08-02 22:05                             ` Russell King

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox