* [2.4.1] system goes glacial, Reiser on /usr doesn't sync
@ 2001-02-20 10:16 ` Kevin Turner
2001-02-20 10:30 ` Keith Owens
` (3 more replies)
0 siblings, 4 replies; 8+ messages in thread
From: Kevin Turner @ 2001-02-20 10:16 UTC (permalink / raw)
To: linux-kernel
Version:
Linux version 2.4.1-pre12 (gcc version 2.95.3 20010125 (prerelease))
Possible suspect players:
dpkg seems to trigger the bug
ReiserFS is the partition that doesn't sync
binfmt_misc shows up in the call traces.
Symptoms:
The system assumes glacial speeds. If you're *lucky*, you'll see one
widget re-paint in X before the next ice age. Ctrl-alt-delete is
unresponsive, as are attempts to start proccesses via the network or
joystick port. Keypresses to programs such as getty are not echoed.
All program output to console and network is stopped dead. If you leave
for a several-hour-long coffee break and come back to it, there's still
no evidence that you banged on the keyboard.
What does work: If not in X, I may switch between VTs. Keypresses are
echoed on VTs without running processes. And the machine still pings.
Occasionally (a few times a minute at most), you'll see the HDD LED
blink.
When it happens: Sometimes, not always, when accessing the package
database or updating installed packages with the Debian's "dpkg" tool.
When it *didn't* happen: before I switched to kernel 2.4.x.
Debugging information:
After I had about enough of this, I recompiled the kernel with
CONFIG_REISERFS_CHECK and Magic SysRq. ReiserFS gives no debugging
messages of any sort during these episodes. Magic SysRq, however, does
work.
Magic-tErminate clears everything up and the system resumes normal
operation.
Magic-Sync appears not to work, as "Ok" and "Done" don't appear. But
after Magic-E, they do show up. Logs show that one partition actually
sync'd before termination, the others didn't. The partition where it
stalled just so happens to be a ReiserFS partition (3.5.x format).
kernel: SysRq: Emergency Sync
kernel: Syncing device 03:01 ... OK
kernel: Syncing device 16:02 ... <6>SysRq: Terminate All Tasks
kernel: OK
kernel: Syncing device 16:01 ... OK
kernel: Syncing device 03:03 ... OK
03:01 - ext2 /
16:02 - ReiFS /usr (note: /var is a symlink to /usr/root-var)
16:01 - ext2 /home
03:03 - ext2 /other
In ReiserFS's defense, /usr and /var are the patitions that dpkg was
most likely to have open files on, so perhaps that was it.
When I asked magic about running processes, it said that, among other
things, there was one apt-get, one dpkg, and three instances of
dpkg-deb. Of those, dpkg was the only one in 'R' state, the others were
'S'. Here's the call trace:
dpkg R 00000000 0 9278 9152 9285 (NOTLB)
Call Trace: [__alloc_pages+372/720] [grow_buffers+62/364]
[refill_freelist+28/48] [refill_freelist+41/48] [getblk+242/264]
[binfmt_misc:__insmod_binfmt_misc_O/lib/modules/2.4.1-pre12/kernel/fs/bi+-621596/96]
[binfmt_misc:__insmod_binfmt_misc_O/lib/modules/2.4.1-pre12/kernel/fs/bi+-970133/96]
[binfmt_misc:__insmod_binfmt_misc_O/lib/modules/2.4.1-pre12/kernel/fs/bi+-750728/96]
[binfmt_misc:__insmod_binfmt_misc_O/lib/modules/2.4.1-pre12/kernel/fs/bi+-976628/96]
[binfmt_misc:__insmod_binfmt_misc_O/lib/modules/2.4.1-pre12/kernel/fs/bi+-918150/96]
[binfmt_misc:__insmod_binfmt_misc_O/lib/modules/2.4.1-pre12/kernel/fs/bi+-976370/96]
[binfmt_misc:__insmod_binfmt_misc_O/lib/modules/2.4.1-pre12/kernel/fs/bi+-1080871/96]
[permission+42/48] [vfs_link+141/192] [sys_link+202/300]
Why binfmt_misc? I'll be burned if I know. It is true that one of the
packages I was installing this run was related to binfmt_misc
("binfmt-support"), but that wasn't the case the previous times this bug
has happened. On the other hand, I don't have call traces for those
times.
Linux version 2.4.1-pre12 (gcc version 2.95.3 20010125 (prerelease))
dpkg version 1.3.8.1
arch is i586 (GenuineIntel Pentium MMX)
disks are IDE
48 MB RAM
swap > 200 MB
Motherboard: ABIT-TX5
Host bridge: Intel Corporation 430TX - 82439TX MTXC (rev 01)
IDE interface: Intel Corporation 82371AB PIIX4 IDE (rev 01)
More logs are posted at:
http://olug.cs.oberlin.edu/~kturner/bugs/glacial-2.4.1.html
The kernel's configuration is at:
http://olug.cs.oberlin.edu/~kturner/bugs/config-2.4.1-pre12
And yes, I will try upgrading to a newer kernel. But since I can't
quite reproduce this on demand yet, I thought I'd report it now while
the logbits are fresh.
--
Kevin Turner <acapnotic@users.sourceforge.net> | OpenPGP encryption welcome here
I usually subscribe to lists I post to, but for LKML I'm making an exception =)
Please feel free to CC me on all follow-ups.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [2.4.1] system goes glacial, Reiser on /usr doesn't sync
2001-02-20 10:16 ` [2.4.1] system goes glacial, Reiser on /usr doesn't sync Kevin Turner
@ 2001-02-20 10:30 ` Keith Owens
2001-02-20 11:33 ` David
` (2 subsequent siblings)
3 siblings, 0 replies; 8+ messages in thread
From: Keith Owens @ 2001-02-20 10:30 UTC (permalink / raw)
To: He-Who-Is-Not-Subscribed-to-LKML; +Cc: linux-kernel
On Tue, 20 Feb 2001 02:16:09 -0800,
Kevin Turner <acapnotic@users.sourceforge.net> wrote:
>[binfmt_misc:__insmod_binfmt_misc_O/lib/modules/2.4.1-pre12/kernel/fs/bi+-621596/96]
>
>Why binfmt_misc? I'll be burned if I know.
Because klogd conversion of addresses to symbols is a pile of crud.
Turn off klogd symbol conversion (klogd -x) and run the raw addresses
through ksymoops.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [2.4.1] system goes glacial, Reiser on /usr doesn't sync
2001-02-20 10:16 ` [2.4.1] system goes glacial, Reiser on /usr doesn't sync Kevin Turner
2001-02-20 10:30 ` Keith Owens
@ 2001-02-20 11:33 ` David
2001-02-20 12:24 ` Kevin Turner
2001-02-20 16:25 ` Chris Mason
2001-02-20 12:34 ` Kevin Turner
2001-02-23 23:35 ` Dwayne C. Litzenberger
3 siblings, 2 replies; 8+ messages in thread
From: David @ 2001-02-20 11:33 UTC (permalink / raw)
To: He-Who-Is-Not-Subscribed-to-LKML; +Cc: linux-kernel
Kevin Turner wrote:
> Version:
> Linux version 2.4.1-pre12 (gcc version 2.95.3 20010125 (prerelease))
>
> Possible suspect players:
> dpkg seems to trigger the bug
> ReiserFS is the partition that doesn't sync
> binfmt_misc shows up in the call traces.
>
> Symptoms:
>
> The system assumes glacial speeds. If you're *lucky*, you'll see one
> widget re-paint in X before the next ice age. Ctrl-alt-delete is
> unresponsive, as are attempts to start proccesses via the network or
> joystick port. Keypresses to programs such as getty are not echoed.
> All program output to console and network is stopped dead. If you leave
> for a several-hour-long coffee break and come back to it, there's still
> no evidence that you banged on the keyboard.
Wild shot in the dark....I'd lay odds that you had about 6-7 Megs free
in your buffers/cache line, yes?
-d
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [2.4.1] system goes glacial, Reiser on /usr doesn't sync
2001-02-20 11:33 ` David
@ 2001-02-20 12:24 ` Kevin Turner
2001-02-20 16:25 ` Chris Mason
1 sibling, 0 replies; 8+ messages in thread
From: Kevin Turner @ 2001-02-20 12:24 UTC (permalink / raw)
To: linux-kernel
On Tue, Feb 20, 2001 at 03:33:33AM -0800, David wrote:
> Wild shot in the dark....I'd lay odds that you had about 6-7 Megs free
> in your buffers/cache line, yes?
David! You're psychic!
SysRq: Show Memory
Mem-info:
Free pages: 712kB ( 0kB HighMem)
( Active: 1779, inactive_dirty: 1507, inactive_clean: 0, free: 178 (192 384 576) )
0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 1*512kB 0*1024kB 0*2048kB = 512kB)
0*4kB 1*8kB 0*16kB 0*32kB 1*64kB 1*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB = 200kB)
= 0kB)
Swap cache: add 73994, delete 73947, find 19040/117035
Free swap: 206664kB
12288 pages of RAM
0 pages of HIGHMEM
653 reserved pages
4740 pages shared
47 pages swap cached
0 pages in page table cache
Buffer memory: 6028kB
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [2.4.1] system goes glacial, Reiser on /usr doesn't sync
2001-02-20 10:16 ` [2.4.1] system goes glacial, Reiser on /usr doesn't sync Kevin Turner
2001-02-20 10:30 ` Keith Owens
2001-02-20 11:33 ` David
@ 2001-02-20 12:34 ` Kevin Turner
2001-02-23 23:35 ` Dwayne C. Litzenberger
3 siblings, 0 replies; 8+ messages in thread
From: Kevin Turner @ 2001-02-20 12:34 UTC (permalink / raw)
To: linux-kernel
Thanks to Keith who pointed out that "klogd conversion of addresses to
symbols is a pile of crud." Here's what I'm getting out of ksymoops
now. It's not as much, since it's just the parts I copied down by
hand... (I'll get it next time. Whenever that happens to be.
Installing the ksymoops package didn't trigger it.)
from magic-showPc:
EIP: 0010:[<c0126533>] CPU: 0 EFLAGS: 00000207
Using defaults from ksymoops -t elf32-i386 -a i386
EAX: c0206f98 EBX: c10af1e0 ECX: c10af1fc EDX: c1091200
ESI: c10af1fc EDI: 00000000 EBP: 000005ac DS: 0018 ES: 0018
CR0: 8005003b CR2: 08052beb CR3: 00839000 CR4: 00000010
Call Trace: [<c0126dec>] [<c0126f66>] [<c0127c58>] [<c0127d0c>] [<c01283b1>] [<c011e0ea>] [<c011e140>]
[<c011e485>] [<c010f83c>] [<c010f704>] [<c0110336>] [<c0127d5e>] [<c0127d84>] [<c0139cea>] [<c0138fe8>]
[<c0108ee4>] [<c013a3e4>] [<c0108da3>]
Warning (Oops_read): Code line not seen, dumping what data is available
>>EIP; c0126533 <page_launder+2d3/8a0> <=====
Trace; c0126dec <do_try_to_free_pages+34/7c>
Trace; c0126f66 <try_to_free_pages+22/2c>
Trace; c0127c58 <__alloc_pages+230/2d0>
Trace; c0127d0c <__get_free_pages+14/20>
Trace; c01283b1 <read_swap_cache_async+31/a0>
Trace; c011e0ea <swapin_readahead+8e/c4>
Trace; c011e140 <do_swap_page+20/114>
Trace; c011e485 <handle_mm_fault+fd/154>
Trace; c010f83c <do_page_fault+138/3fc>
Trace; c010f704 <do_page_fault+0/3fc>
Trace; c0110336 <schedule+26a/394>
Trace; c0127d5e <__free_pages+1a/1c>
Trace; c0127d84 <free_pages+24/28>
Trace; c0139cea <poll_freewait+3a/44>
Trace; c0138fe8 <do_fcntl+14c/204>
Trace; c0108ee4 <error_code+34/40>
Trace; c013a3e4 <sys_select+3bc/494>
Trace; c0108da3 <system_call+33/40>
The running dpkg process:
Warning (Oops_read): Code line not seen, dumping what data is available
Trace; c0127b9c <__alloc_pages+174/2d0>
Trace; c012f932 <grow_buffers+3e/16c>
Trace; c012dad4 <refill_freelist+1c/30>
Trace; c012dae1 <refill_freelist+29/30>
Trace; c012dec2 <getblk+f2/108>
Trace; c38b7378 <[reiserfs].bss.end+45979/af661>
Trace; c386226b <[reiserfs]do_journal_end+633/ab4>
Trace; c3897b78 <[reiserfs].bss.end+26179/af661>
Trace; c386090c <[reiserfs]do_journal_begin_r+188/258>
Trace; c012c0bc <filp_close+5c/64>
Trace; c0108da3 <system_call+33/40>
The running dpkg process, several minutes later is the same, but
the "bss.end" line above the 'do_journal_end' call reads:
Trace; c38b73e4 <[reiserfs].bss.end+459e5/af661>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [2.4.1] system goes glacial, Reiser on /usr doesn't sync
2001-02-20 11:33 ` David
2001-02-20 12:24 ` Kevin Turner
@ 2001-02-20 16:25 ` Chris Mason
1 sibling, 0 replies; 8+ messages in thread
From: Chris Mason @ 2001-02-20 16:25 UTC (permalink / raw)
To: David, He-Who-Is-Not-Subscribed-to-LKML; +Cc: linux-kernel
On Tuesday, February 20, 2001 03:33:33 AM -0800 David <david@blue-labs.org>
wrote:
> Kevin Turner wrote:
>
>> Version:
>> Linux version 2.4.1-pre12 (gcc version 2.95.3 20010125 (prerelease))
>>
>> Possible suspect players:
>> dpkg seems to trigger the bug
>> ReiserFS is the partition that doesn't sync
>> binfmt_misc shows up in the call traces.
>>
>> Symptoms:
>>
>> The system assumes glacial speeds. If you're *lucky*, you'll see one
>> widget re-paint in X before the next ice age. Ctrl-alt-delete is
>> unresponsive, as are attempts to start proccesses via the network or
>> joystick port. Keypresses to programs such as getty are not echoed.
>> All program output to console and network is stopped dead. If you leave
>> for a several-hour-long coffee break and come back to it, there's still
>> no evidence that you banged on the keyboard.
>
>
> Wild shot in the dark....I'd lay odds that you had about 6-7 Megs free in
> your buffers/cache line, yes?
>
David, have any of Rik's patches helped here?
-chris
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [2.4.1] system goes glacial, Reiser on /usr doesn't sync
2001-02-20 10:16 ` [2.4.1] system goes glacial, Reiser on /usr doesn't sync Kevin Turner
` (2 preceding siblings ...)
2001-02-20 12:34 ` Kevin Turner
@ 2001-02-23 23:35 ` Dwayne C. Litzenberger
2001-02-24 2:21 ` Dwayne C. Litzenberger
3 siblings, 1 reply; 8+ messages in thread
From: Dwayne C. Litzenberger @ 2001-02-23 23:35 UTC (permalink / raw)
To: Kevin Turner; +Cc: linux-kernel
[-- Attachment #1: Type: text/plain, Size: 460 bytes --]
I have the same problem with 2.4.1 (and 2.4.2). Two processes that are
actively using the disk (multiple files) seem to deadlock the system. Killing
the right process (SysRq-K) seems to fix things.
I'm kind of new to kernel debugging. Anyone want to guide me through it?
--
Dwayne C. Litzenberger - dlitz@dlitz.net
- Please always Cc to me when replying to me on the lists.
- See the mail headers for GPG/advertising/homepage information.
[-- Attachment #2: Type: application/pgp-signature, Size: 240 bytes --]
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [2.4.1] system goes glacial, Reiser on /usr doesn't sync
2001-02-23 23:35 ` Dwayne C. Litzenberger
@ 2001-02-24 2:21 ` Dwayne C. Litzenberger
0 siblings, 0 replies; 8+ messages in thread
From: Dwayne C. Litzenberger @ 2001-02-24 2:21 UTC (permalink / raw)
To: Kevin Turner; +Cc: linux-kernel
[-- Attachment #1: Type: text/plain, Size: 373 bytes --]
I think Alan fixed it.
I've been running 2.4.2-ac3 under heavy load for about a half-hour now under
heavy disk usage (apt-get + 2 kernel builds + Netscape/X11), and it hasn't
locked up yet.
--
Dwayne C. Litzenberger - dlitz@dlitz.net
- Please always Cc to me when replying to me on the lists.
- See the mail headers for GPG/advertising/homepage information.
[-- Attachment #2: Type: application/pgp-signature, Size: 240 bytes --]
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2001-02-24 2:21 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20010220011536.A10778@troglodyte.menefee>
2001-02-20 10:16 ` [2.4.1] system goes glacial, Reiser on /usr doesn't sync Kevin Turner
2001-02-20 10:30 ` Keith Owens
2001-02-20 11:33 ` David
2001-02-20 12:24 ` Kevin Turner
2001-02-20 16:25 ` Chris Mason
2001-02-20 12:34 ` Kevin Turner
2001-02-23 23:35 ` Dwayne C. Litzenberger
2001-02-24 2:21 ` Dwayne C. Litzenberger
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox