2.6.3-rc3 (and possibly earlier 2.6): weird hang and oopses

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* 2.6.3-rc3 (and possibly earlier 2.6): weird hang and oopses
@ 2004-02-16 22:47 Alessandro Suardi
  0 siblings, 0 replies; 4+ messages in thread
From: Alessandro Suardi @ 2004-02-16 22:47 UTC (permalink / raw)
  To: linux-kernel; +Cc: linux-acpi

[CC:ing linux-acpi since some acpi stuff appears in backtraces]

While apparently doing nothing special (possibly a 'rm' on a
  regular ext3 filesystem) my laptop hung. Not completely, as
  I could

  * switch virtual desktops within Ximian Desktop 2
  * click on the kill window top right button, see the "app is
     not responding, kill it anyway ?" dialog, say ok, see the
     gnome-terminal vanish
  * Alt-Fn to virtual consoles, type a login name (but getting
     no prompt for the password - this hung)
  * Alt-SysRq


Trying to get more info, I Alt-SysRq-P seeing this (handcopied
  but should be fairly reliable :) :


Pid: 0, comm:     swapper
EIP: 0060: acpi_processor_idle+0x13c/0x1cb

  default_idle+0x0/0x27
  rest_init+0x0/0x5e
  acpi_nt_copy_ipackage_to_ipackage+0x69/0xdb
  default_idle+0x0/0x27
  rest_init+0x0/0x5e
  cpu_idle+0x2e/0x37
  start_kernel+0x182/0x1b0
  unknown_bootoption+0x0,0xff


While copying this down, there were 'ps' oopses at regular
  intervals (say 2/3 minutes apart from each other), with this
  further oops trace:

  pid_revalidate+0x28/0xd2
  pid_revalidate+0x41/0xd2
  dput+0x22/0x21f
  link_path_walk+0x61b/0x957
  buffered_rmqueue+0xc1/0x15a
  __alloc_pages+0xa4/0x342
  proc_info_read+0x74/0x155
  filp_open+0x67/0x69
  vfs_read+0xbc/0x127
  sys_read+0x42/0x63
  sysenter_past_esp+0x52/0x71

And right after each oops a further trace, with the warning
  that 'ps' exited with a preempt_count of 1:

Bad: scheduling while atomic

  schedule
  unmap_page_range
  unmap_vmas
  exit_mmap
  mmput
  do_exit
  do_divide
  do_page_fault
  acpi_processor_set_performance
  error_code
  file_read_actor

There was more, but I couldn't copy further info due to pressing
  time constraints. This isn't the first time a 2.6.x kernel hangs
  on me, and IIRC 2.6.1 never did.


Oh, and of course I still can't Alt-SysRq-B :(


Thanks for looking into this, ciao,

--alessandro

  "Two rivers run too deep
   The seasons change and so do I"
       (U2, "Indian Summer Sky")


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: 2.6.3-rc3 (and possibly earlier 2.6): weird hang and oopses
       [not found] <A6974D8E5F98D511BB910002A50A6647615F214C@hdsmsx402.hd.intel.com>
@ 2004-02-17  6:26 ` Len Brown
  2004-02-17 20:10   ` Alessandro Suardi
  0 siblings, 1 reply; 4+ messages in thread
From: Len Brown @ 2004-02-17  6:26 UTC (permalink / raw)
  To: Alessandro Suardi; +Cc: linux-kernel, ACPI Developers

Alessandro,
Sure looks like a failure in the ACPI processor driver.

Please confirm your system is otherwise happy when you disable the
processor driver.  eg. CONFIG_ACPI_PROCESSOR=n

Also, it would be helpful to know if this failure started recently or
you saw it in previous releases, b/c we've made some changes to the
processor driver recently.

thanks,
-Len

ps. acpi-devel@lists.sourceforge.net is the preferred alias to send
Linux ACPI issues -- it includes linux-acpi@intel.com which is a small
sub-set.

On Mon, 2004-02-16 at 17:47, Alessandro Suardi wrote:
> [CC:ing linux-acpi since some acpi stuff appears in backtraces]
> 
> While apparently doing nothing special (possibly a 'rm' on a
>   regular ext3 filesystem) my laptop hung. Not completely, as
>   I could
> 
>   * switch virtual desktops within Ximian Desktop 2
>   * click on the kill window top right button, see the "app is
>      not responding, kill it anyway ?" dialog, say ok, see the
>      gnome-terminal vanish
>   * Alt-Fn to virtual consoles, type a login name (but getting
>      no prompt for the password - this hung)
>   * Alt-SysRq
> 
> 
> Trying to get more info, I Alt-SysRq-P seeing this (handcopied
>   but should be fairly reliable :) :
> 
> 
> Pid: 0, comm:     swapper
> EIP: 0060: acpi_processor_idle+0x13c/0x1cb
> 
>   default_idle+0x0/0x27
>   rest_init+0x0/0x5e
>   acpi_nt_copy_ipackage_to_ipackage+0x69/0xdb
>   default_idle+0x0/0x27
>   rest_init+0x0/0x5e
>   cpu_idle+0x2e/0x37
>   start_kernel+0x182/0x1b0
>   unknown_bootoption+0x0,0xff
> 
> 
> While copying this down, there were 'ps' oopses at regular
>   intervals (say 2/3 minutes apart from each other), with this
>   further oops trace:
> 
>   pid_revalidate+0x28/0xd2
>   pid_revalidate+0x41/0xd2
>   dput+0x22/0x21f
>   link_path_walk+0x61b/0x957
>   buffered_rmqueue+0xc1/0x15a
>   __alloc_pages+0xa4/0x342
>   proc_info_read+0x74/0x155
>   filp_open+0x67/0x69
>   vfs_read+0xbc/0x127
>   sys_read+0x42/0x63
>   sysenter_past_esp+0x52/0x71
> 
> And right after each oops a further trace, with the warning
>   that 'ps' exited with a preempt_count of 1:
> 
> Bad: scheduling while atomic
> 
>   schedule
>   unmap_page_range
>   unmap_vmas
>   exit_mmap
>   mmput
>   do_exit
>   do_divide
>   do_page_fault
>   acpi_processor_set_performance
>   error_code
>   file_read_actor
> 
> There was more, but I couldn't copy further info due to pressing
>   time constraints. This isn't the first time a 2.6.x kernel hangs
>   on me, and IIRC 2.6.1 never did.
> 
> 
> Oh, and of course I still can't Alt-SysRq-B :(
> 
> 
> Thanks for looking into this, ciao,
> 
> --alessandro
> 
>   "Two rivers run too deep
>    The seasons change and so do I"
>        (U2, "Indian Summer Sky")
> 


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: 2.6.3-rc3 (and possibly earlier 2.6): weird hang and oopses
  2004-02-17  6:26 ` 2.6.3-rc3 (and possibly earlier 2.6): weird hang and oopses Len Brown
@ 2004-02-17 20:10   ` Alessandro Suardi
  2004-02-18  9:00     ` [ACPI] " Dominik Brodowski
  0 siblings, 1 reply; 4+ messages in thread
From: Alessandro Suardi @ 2004-02-17 20:10 UTC (permalink / raw)
  To: Len Brown; +Cc: linux-kernel, ACPI Developers

[-- Attachment #1: Type: text/plain, Size: 1815 bytes --]

Len Brown wrote:
> Alessandro,
> Sure looks like a failure in the ACPI processor driver.
 >
> Please confirm your system is otherwise happy when you disable the
> processor driver.  eg. CONFIG_ACPI_PROCESSOR=n
 >
> Also, it would be helpful to know if this failure started recently or
> you saw it in previous releases, b/c we've made some changes to the
> processor driver recently.

Will run from now for a couple of weeks with CONFIG_ACPI_PROCESSOR=n;
  I checked my logs and noticed my first hang happened with 2.6.2, but
  so far I only experienced the problem twice since Feb 6.

I just now noticed that in /var/log I have the full Oops traces
  (until I Alt-SysRq'd out of it), so I'm attaching them; would you
  please take a further look and confirm this is _only_ an ACPI-related
  issue ?

messages.gz is 2.6.3-rc3, messages.2.gz is 2.6.2 vanilla.

> thanks,
> -Len
> 
> ps. acpi-devel@lists.sourceforge.net is the preferred alias to send
> Linux ACPI issues -- it includes linux-acpi@intel.com which is a small
> sub-set.

OK, thanks for the info, will do next time.

> On Mon, 2004-02-16 at 17:47, Alessandro Suardi wrote:
> 
>>[CC:ing linux-acpi since some acpi stuff appears in backtraces]
>>
>>While apparently doing nothing special (possibly a 'rm' on a
>>  regular ext3 filesystem) my laptop hung. Not completely, as
>>  I could
>>
>>  * switch virtual desktops within Ximian Desktop 2
>>  * click on the kill window top right button, see the "app is
>>     not responding, kill it anyway ?" dialog, say ok, see the
>>     gnome-terminal vanish
>>  * Alt-Fn to virtual consoles, type a login name (but getting
>>     no prompt for the password - this hung)
>>  * Alt-SysRq

Many thanks,

--alessandro

  "Two rivers run too deep
   The seasons change and so do I"
       (U2, "Indian Summer Sky")

[-- Attachment #2: messages.gz --]
[-- Type: application/x-gzip, Size: 5232 bytes --]

[-- Attachment #3: messages.2.gz --]
[-- Type: application/x-gzip, Size: 8850 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [ACPI] Re: 2.6.3-rc3 (and possibly earlier 2.6): weird hang and oopses
  2004-02-17 20:10   ` Alessandro Suardi
@ 2004-02-18  9:00     ` Dominik Brodowski
  0 siblings, 0 replies; 4+ messages in thread
From: Dominik Brodowski @ 2004-02-18  9:00 UTC (permalink / raw)
  To: Alessandro Suardi; +Cc: Len Brown, linux-kernel, ACPI Developers

[-- Attachment #1: Type: text/plain, Size: 6283 bytes --]

On Tue, Feb 17, 2004 at 09:10:22PM +0100, Alessandro Suardi wrote:
> Will run from now for a couple of weeks with CONFIG_ACPI_PROCESSOR=n;
>  I checked my logs and noticed my first hang happened with 2.6.2,

IIRC 2.6.2 didn't yet contain the processor updates...

> I just now noticed that in /var/log I have the full Oops traces
>  (until I Alt-SysRq'd out of it), so I'm attaching them; would you
>  please take a further look and confirm this is _only_ an ACPI-related
>  issue ?

The first Oops seems to be not related to ACPI:

Feb 16 15:57:48 incident kernel: Oops: 0000 [#1]
Feb 16 15:57:48 incident kernel: CPU:    0
Feb 16 15:57:48 incident kernel: EIP:    0060:[<c0242045>]    Not tainted
Feb 16 15:57:48 incident kernel: EFLAGS: 00010246
Feb 16 15:57:48 incident kernel: EIP is at init_dev+0x2b/0x567
Feb 16 15:57:48 incident kernel: eax: a1192400   ebx: d2e6e000   ecx: c0418f38   edx: 00008802
Feb 16 15:57:48 incident kernel: esi: 02000000   edi: f3986480   ebp: a1192400   esp: d2e6fe98
Feb 16 15:57:48 incident kernel: ds: 007b   es: 007b   ss: 0068
Feb 16 15:57:48 incident kernel: Process sh (pid: 7260, threadinfo=d2e6e000 task=cd98ecc0)
Feb 16 15:57:48 incident kernel: Stack: a1192400 00000000 f7dabb80 c01655e6 f7dabb80 c03e52d0 00000000 f782f080 
Feb 16 15:57:48 incident kernel:        f30d2580 c015cc36 f7dabb80 d2e6ff04 d2e6ff00 f7dabb80 420d2290 f554c300 
Feb 16 15:57:48 incident kernel:        d2e6e000 02000000 f3986480 00500000 c0242eea 02000000 a1192400 d2e6ff00 
Feb 16 15:57:48 incident kernel: Call Trace:
Feb 16 15:57:48 incident kernel:  [<c01655e6>] dput+0x22/0x21f
Feb 16 15:57:48 incident kernel:  [<c015cc36>] link_path_walk+0x61b/0x957
Feb 16 15:57:48 incident kernel:  [<c0242eea>] tty_open+0x90/0x36d
Feb 16 15:57:48 incident kernel:  [<c0242e5a>] tty_open+0x0/0x36d
Feb 16 15:57:48 incident kernel:  [<c0157c7d>] chrdev_open+0xf3/0x21c
Feb 16 15:57:48 incident kernel:  [<c015d860>] open_namei+0xa6/0x400
Feb 16 15:57:48 incident kernel:  [<c0157b8a>] chrdev_open+0x0/0x21c
Feb 16 15:57:48 incident kernel:  [<c014e264>] dentry_open+0x14d/0x218
Feb 16 15:57:48 incident kernel:  [<c014e115>] filp_open+0x67/0x69
Feb 16 15:57:48 incident kernel:  [<c014e598>] sys_open+0x5b/0x8b
Feb 16 15:57:48 incident kernel:  [<c0108f9d>] sysenter_past_esp+0x52/0x71

... and neither the second, but then the "bad: scheduling while atomic"
calls start. And this call trace looks quite strange... There is no reason
ps should call "acpi_processor_set_performance".... But well, the kernel is
in an inconsistent state already because of the two previous oopses...

Is the kernel compiled with "frame pointers"? CONFIG_FRAME_POINTER ? If not, 
please change this setting to "y".

What follows then are other oopses and bad: scheduling while atomic notices
where I cannot see any relation to ACPI.

Feb 16 16:07:16 incident kernel: SysRq : Show Regs
Feb 16 16:07:16 incident kernel: 
Feb 16 16:07:16 incident kernel: Pid: 0, comm:              swapper
Feb 16 16:07:16 incident kernel: EIP: 0060:[<c02380f8>] CPU: 0
Feb 16 16:07:16 incident kernel: EIP is at acpi_processor_idle+0x13c/0x1cb
Feb 16 16:07:16 incident kernel:  EFLAGS: 00000216    Not tainted
Feb 16 16:07:16 incident kernel: EAX: 0050d212 EBX: 00000808 ECX: 0050d079 EDX: 00000808
Feb 16 16:07:16 incident kernel: ESI: c1b7d2b0 EDI: c0105000 EBP: c1b7d200 DS: 007b ES: 007b
Feb 16 16:07:16 incident kernel: CR0: 8005003b CR2: 421b7000 CR3: 35924000 CR4: 000006d0
Feb 16 16:07:16 incident kernel: Call Trace:
Feb 16 16:07:16 incident kernel:  [<c0106cee>] default_idle+0x0/0x27
Feb 16 16:07:16 incident kernel:  [<c0105000>] rest_init+0x0/0x5e
Feb 16 16:07:16 incident kernel:  [<c023007b>] acpi_ut_copy_ipackage_to_ipackage+0x69/0xdb
Feb 16 16:07:16 incident kernel:  [<c0106cee>] default_idle+0x0/0x27
Feb 16 16:07:16 incident kernel:  [<c0105000>] rest_init+0x0/0x5e
Feb 16 16:07:16 incident kernel:  [<c0106d79>] cpu_idle+0x2e/0x37
Feb 16 16:07:16 incident kernel:  [<c0462686>] start_kernel+0x182/0x1b0
Feb 16 16:07:16 incident kernel:  [<c04623dd>] unknown_bootoption+0x0/0xff

acpi_processor_idle seems to innocent, "ps" is causing an oops again:

Feb 16 16:08:28 incident kernel: Unable to handle kernel paging request at virtual address 02000064
Feb 16 16:08:28 incident kernel:  printing eip:
Feb 16 16:08:28 incident kernel: c017b7ce
Feb 16 16:08:28 incident kernel: *pde = 00000000
Feb 16 16:08:28 incident kernel: Oops: 0000 [#7]
Feb 16 16:08:28 incident kernel: CPU:    0
Feb 16 16:08:28 incident kernel: EIP:    0060:[<c017b7ce>]    Not tainted
Feb 16 16:08:28 incident kernel: EFLAGS: 00010286
Feb 16 16:08:28 incident kernel: EIP is at proc_pid_stat+0xa8/0x53c
Feb 16 16:08:28 incident kernel: eax: 00000000   ebx: 02000000   ecx: f4971000   edx: c03e6330
Feb 16 16:08:28 incident kernel: esi: e9b0d900   edi: c4c7c580   ebp: c3a58000   esp: c3a59e3c
Feb 16 16:08:28 incident kernel: ds: 007b   es: 007b   ss: 0068
Feb 16 16:08:28 incident kernel: Process ps (pid: 7430, threadinfo=c3a58000 task=d56e52e0)
Feb 16 16:08:28 incident kernel: Stack: c4c7c580 ffffffff 00000008 c4c7c780 00000010 f1db65f0 f7f57858 c3a58000 
Feb 16 16:08:28 incident kernel:        c3a58000 f1db6580 e3935006 c0179382 f1db6ef0 f7f570f8 c3a58000 c3a58000 
Feb 16 16:08:28 incident kernel:        f1db6e80 f254f510 c017939b e9b0d900 f1db6e80 c3a59f70 f7ff4700 c3a59f00 
Feb 16 16:08:28 incident kernel: Call Trace:
Feb 16 16:08:28 incident kernel:  [<c0179382>] pid_revalidate+0x28/0xd2
Feb 16 16:08:28 incident kernel:  [<c017939b>] pid_revalidate+0x41/0xd2
Feb 16 16:08:28 incident kernel:  [<c01655e6>] dput+0x22/0x21f
Feb 16 16:08:28 incident kernel:  [<c015cc36>] link_path_walk+0x61b/0x957
Feb 16 16:08:28 incident kernel:  [<c013741c>] buffered_rmqueue+0xc1/0x15a
Feb 16 16:08:28 incident kernel:  [<c0137559>] __alloc_pages+0xa4/0x342
Feb 16 16:08:28 incident kernel:  [<c01787e2>] proc_info_read+0x74/0x155
Feb 16 16:08:28 incident kernel:  [<c014e115>] filp_open+0x67/0x69
Feb 16 16:08:28 incident kernel:  [<c014ee92>] vfs_read+0xbc/0x127
Feb 16 16:08:28 incident kernel:  [<c014f11d>] sys_read+0x42/0x63
Feb 16 16:08:28 incident kernel:  [<c0108f9d>] sysenter_past_esp+0x52/0x71

	Dominik

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2004-02-18  9:02 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <A6974D8E5F98D511BB910002A50A6647615F214C@hdsmsx402.hd.intel.com>
2004-02-17  6:26 ` 2.6.3-rc3 (and possibly earlier 2.6): weird hang and oopses Len Brown
2004-02-17 20:10   ` Alessandro Suardi
2004-02-18  9:00     ` [ACPI] " Dominik Brodowski
2004-02-16 22:47 Alessandro Suardi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox