linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
* Kernel crashing problems on oldworld
@ 2001-11-23 10:25 Chris Tillman
  2001-11-25  4:44 ` Andrew Sharp
  0 siblings, 1 reply; 4+ messages in thread
From: Chris Tillman @ 2001-11-23 10:25 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: debian-powerpc


I have a PowerBase 180 oldworld powerpc with Debian woody
(2.2.19-pmac). I started having a crashing problem about 4 weeks ago,
the easiest way to repeat it was to use nano (1.0.5/1.0.6) to edit a
document, and select the Search function or the save-file function. In
either case, the computer would freeze as it was attempting to update
the help keys across the bottom of the screen. This never occurred in
nano-tiny, which is a reduced-library version.

There is no kernel message with this crash, it simply freezes the
machine. I tried strace, its log gets cut off shortly after the
keystroke which causes the crash, so it's not much help.

I traced the execution of the code from nano into the ncurses library,
and determined that it was crashing right after a call to
_nc_flush(). Here is an example backtrace just prior to a crash:

in doupdate() at ../ncurses/tty/tty_update.c:787   _nc_flush();
called by wrefresh() at ../ncurses/base/lib-refresh.c:60
called by bottombars() at ../nano/winio.c:562
called by display_main_list() at ../nano/winio.c:1152
called by search_abort() at search.c:298

The maintainer says that _nc_flush() simply calls fflush. Of course,
that function is called many times in this and other programs -- it is
only these special circumstances in which it causes a crash.

This problem doesn't occur on my newworld powerpc, but it is
reliably repeatable on my oldworld. I also got serveral crashes in
dselect which uses the same ncurses library.

I verified that the same thing happens if I install on a different
partition, even a different disk. It did *not* repeat on another
oldworld computer (PowerCenter 150) with the same woody installation
(it's on an external SCSI disk, I just moved the disk to the new
machine) - so I concluded it was some conflict with my ATY Mach64 card
(which according to lspci, is an ATI 3D Rage I/II 215GT [Mach64GT]
(rev 41). I tried adding video=atyfb to the kernel arguments, that did
not affect it.

I constructed a patch which prevents the crash; not sure if it would
be much help in determining the true cause though. The patch just uses
wredrawln to claim that the 2 help lines are corrupt and asks for them
to be completely redrawn rather than just refreshed.

Another interesting point is that I can run all the ncurses test
programs except worm, which crashes in a few seconds, but only when
the console it's running on is actually visible. And, the blink and
underline attributes don't seem to work on my system.

Finally, I downloaded the kernel-image-2.4.12-powerpc package to see
if changing the kernel would make a difference. It did, the nano crash
disappeared. The worm crash is still present, though.

I can continue trying to trace the problem in worm, if there's
interest, since it's present in both kernels. Or if there's interest
in the kernel 2.2.19 nano problem, whatever... I am interested in
helping track down the problem because it's such a hard crash. My
filesystem is of course damaged when I have to hard reboot.

I signed up to the list. Let me know what I can try. And Happy
Thanksgiving!

--
*----------------------------------------------------------------*
|  .''`.  |   Debian GNU/Linux: <http://www.debian.org>          |
| : :'  : |   debian-imac: <http://debian-imac.sourceforge.net>  |
| `. `'`  |      Chris Tillman        tillman@azstarnet.com      |
|   `-    |            May the Source be with you                |
*----------------------------------------------------------------*

** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Kernel crashing problems on oldworld
  2001-11-23 10:25 Kernel crashing problems on oldworld Chris Tillman
@ 2001-11-25  4:44 ` Andrew Sharp
  2001-11-26  1:09   ` Chris Tillman
  0 siblings, 1 reply; 4+ messages in thread
From: Andrew Sharp @ 2001-11-25  4:44 UTC (permalink / raw)
  To: linuxppc-dev, debian-powerpc


Chris Tillman wrote:
>
> I have a PowerBase 180 oldworld powerpc with Debian woody
> (2.2.19-pmac). I started having a crashing problem about 4 weeks ago,
> the easiest way to repeat it was to use nano (1.0.5/1.0.6) to edit a
> document, and select the Search function or the save-file function. In
> either case, the computer would freeze as it was attempting to update
> the help keys across the bottom of the screen. This never occurred in
> nano-tiny, which is a reduced-library version.
>
> There is no kernel message with this crash, it simply freezes the
> machine. I tried strace, its log gets cut off shortly after the
> keystroke which causes the crash, so it's not much help.
>
> I traced the execution of the code from nano into the ncurses library,
> and determined that it was crashing right after a call to
> _nc_flush(). Here is an example backtrace just prior to a crash:
>
> in doupdate() at ../ncurses/tty/tty_update.c:787   _nc_flush();
> called by wrefresh() at ../ncurses/base/lib-refresh.c:60
> called by bottombars() at ../nano/winio.c:562
> called by display_main_list() at ../nano/winio.c:1152
> called by search_abort() at search.c:298
>
> The maintainer says that _nc_flush() simply calls fflush. Of course,
> that function is called many times in this and other programs -- it is
> only these special circumstances in which it causes a crash.
>
> This problem doesn't occur on my newworld powerpc, but it is
> reliably repeatable on my oldworld. I also got serveral crashes in
> dselect which uses the same ncurses library.
>
> I verified that the same thing happens if I install on a different
> partition, even a different disk. It did *not* repeat on another
> oldworld computer (PowerCenter 150) with the same woody installation
> (it's on an external SCSI disk, I just moved the disk to the new
> machine) - so I concluded it was some conflict with my ATY Mach64 card
> (which according to lspci, is an ATI 3D Rage I/II 215GT [Mach64GT]
> (rev 41). I tried adding video=atyfb to the kernel arguments, that did
> not affect it.
>
> I constructed a patch which prevents the crash; not sure if it would
> be much help in determining the true cause though. The patch just uses
> wredrawln to claim that the 2 help lines are corrupt and asks for them
> to be completely redrawn rather than just refreshed.
>
> Another interesting point is that I can run all the ncurses test
> programs except worm, which crashes in a few seconds, but only when
> the console it's running on is actually visible. And, the blink and
> underline attributes don't seem to work on my system.
>
> Finally, I downloaded the kernel-image-2.4.12-powerpc package to see
> if changing the kernel would make a difference. It did, the nano crash
> disappeared. The worm crash is still present, though.
>
> I can continue trying to trace the problem in worm, if there's
> interest, since it's present in both kernels. Or if there's interest
> in the kernel 2.2.19 nano problem, whatever... I am interested in
> helping track down the problem because it's such a hard crash. My
> filesystem is of course damaged when I have to hard reboot.
>
> I signed up to the list. Let me know what I can try. And Happy
> Thanksgiving!

Thanks.  Are you sure it's not just some memory that's gone bad
rather than a conflict with the video card?

a


** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Kernel crashing problems on oldworld
  2001-11-25  4:44 ` Andrew Sharp
@ 2001-11-26  1:09   ` Chris Tillman
  2001-11-26 19:49     ` Andrew Sharp
  0 siblings, 1 reply; 4+ messages in thread
From: Chris Tillman @ 2001-11-26  1:09 UTC (permalink / raw)
  To: linuxppc-dev, debian-powerpc


On Sat, Nov 24, 2001 at 08:44:31PM -0800, Andrew Sharp wrote:
>
> Chris Tillman wrote:
> >
> > I have a PowerBase 180 oldworld powerpc with Debian woody
> > (2.2.19-pmac). I started having a crashing problem about 4 weeks ago,
> > the easiest way to repeat it was to use nano (1.0.5/1.0.6) to edit a
> > document, and select the Search function or the save-file function. In
> > either case, the computer would freeze as it was attempting to update
> > the help keys across the bottom of the screen. This never occurred in
> > nano-tiny, which is a reduced-library version.
> >
> > There is no kernel message with this crash, it simply freezes the
> > machine. I tried strace, its log gets cut off shortly after the
> > keystroke which causes the crash, so it's not much help.
> >
> > I traced the execution of the code from nano into the ncurses library,
> > and determined that it was crashing right after a call to
> > _nc_flush(). Here is an example backtrace just prior to a crash:
> >
> > in doupdate() at ../ncurses/tty/tty_update.c:787   _nc_flush();
> > called by wrefresh() at ../ncurses/base/lib-refresh.c:60
> > called by bottombars() at ../nano/winio.c:562
> > called by display_main_list() at ../nano/winio.c:1152
> > called by search_abort() at search.c:298
> >
> > The maintainer says that _nc_flush() simply calls fflush. Of course,
> > that function is called many times in this and other programs -- it is
> > only these special circumstances in which it causes a crash.
> >
> > This problem doesn't occur on my newworld powerpc, but it is
> > reliably repeatable on my oldworld. I also got serveral crashes in
> > dselect which uses the same ncurses library.
> >
> > I verified that the same thing happens if I install on a different
> > partition, even a different disk. It did *not* repeat on another
> > oldworld computer (PowerCenter 150) with the same woody installation
> > (it's on an external SCSI disk, I just moved the disk to the new
> > machine) - so I concluded it was some conflict with my ATY Mach64 card
> > (which according to lspci, is an ATI 3D Rage I/II 215GT [Mach64GT]
> > (rev 41). I tried adding video=atyfb to the kernel arguments, that did
> > not affect it.
> >
> > I constructed a patch which prevents the crash; not sure if it would
> > be much help in determining the true cause though. The patch just uses
> > wredrawln to claim that the 2 help lines are corrupt and asks for them
> > to be completely redrawn rather than just refreshed.
> >
> > Another interesting point is that I can run all the ncurses test
> > programs except worm, which crashes in a few seconds, but only when
> > the console it's running on is actually visible. And, the blink and
> > underline attributes don't seem to work on my system.
> >
> > Finally, I downloaded the kernel-image-2.4.12-powerpc package to see
> > if changing the kernel would make a difference. It did, the nano crash
> > disappeared. The worm crash is still present, though.
> >
> > I can continue trying to trace the problem in worm, if there's
> > interest, since it's present in both kernels. Or if there's interest
> > in the kernel 2.2.19 nano problem, whatever... I am interested in
> > helping track down the problem because it's such a hard crash. My
> > filesystem is of course damaged when I have to hard reboot.
> >
> > I signed up to the list. Let me know what I can try. And Happy
> > Thanksgiving!
>
> Thanks.  Are you sure it's not just some memory that's gone bad
> rather than a conflict with the video card?
>

Well, I don't know how it could hit the same memory location at
exactly the same time over more than 50 attempts (each time rebooting)
and using two different released versions of the program plus my own
compiled versions with debug statements added. Or could it? How would
I test that?

I wonder if the video card could be moved to the PowerCenter machine
to see if the problem follows it? Probably not... The other machine
does have a different card.

--
*----------------------------------------------------------------*
|  .''`.  |   Debian GNU/Linux: <http://www.debian.org>          |
| : :'  : |   debian-imac: <http://debian-imac.sourceforge.net>  |
| `. `'`  |      Chris Tillman        tillman@azstarnet.com      |
|   `-    |            May the Source be with you                |
*----------------------------------------------------------------*

** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Kernel crashing problems on oldworld
  2001-11-26  1:09   ` Chris Tillman
@ 2001-11-26 19:49     ` Andrew Sharp
  0 siblings, 0 replies; 4+ messages in thread
From: Andrew Sharp @ 2001-11-26 19:49 UTC (permalink / raw)
  To: linuxppc-dev, debian-powerpc


Chris Tillman wrote:
>
> On Sat, Nov 24, 2001 at 08:44:31PM -0800, Andrew Sharp wrote:
> >
> > Chris Tillman wrote:

> > > Another interesting point is that I can run all the ncurses test
> > > programs except worm, which crashes in a few seconds, but only when
> > > the console it's running on is actually visible. And, the blink and
> > > underline attributes don't seem to work on my system.
> > >
> > > Finally, I downloaded the kernel-image-2.4.12-powerpc package to see
> > > if changing the kernel would make a difference. It did, the nano crash
> > > disappeared. The worm crash is still present, though.
> > >
> > > I can continue trying to trace the problem in worm, if there's
> > > interest, since it's present in both kernels. Or if there's interest
> > > in the kernel 2.2.19 nano problem, whatever... I am interested in
> > > helping track down the problem because it's such a hard crash. My
> > > filesystem is of course damaged when I have to hard reboot.
> > >
> > > I signed up to the list. Let me know what I can try. And Happy
> > > Thanksgiving!
> >
> > Thanks.  Are you sure it's not just some memory that's gone bad
> > rather than a conflict with the video card?
> >
>
> Well, I don't know how it could hit the same memory location at
> exactly the same time over more than 50 attempts (each time rebooting)
> and using two different released versions of the program plus my own
> compiled versions with debug statements added. Or could it? How would
> I test that?
>
> I wonder if the video card could be moved to the PowerCenter machine
> to see if the problem follows it? Probably not... The other machine
> does have a different card.

It may be an entire bank or line or whatever, not necessarily a
particular single addressed location.  You can try removing
individual sticks of memory, or moving memory around to your other
boxes to see if the problem follows any particular stick.

a


** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2001-11-26 19:49 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2001-11-23 10:25 Kernel crashing problems on oldworld Chris Tillman
2001-11-25  4:44 ` Andrew Sharp
2001-11-26  1:09   ` Chris Tillman
2001-11-26 19:49     ` Andrew Sharp

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).