public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* Oops in kswapd, 2.4.19 kernel and before
@ 2002-10-28 10:24 Hugo Mills
  2002-10-28 10:47 ` Morten Helgesen
  2002-10-28 12:29 ` Andrea Arcangeli
  0 siblings, 2 replies; 5+ messages in thread
From: Hugo Mills @ 2002-10-28 10:24 UTC (permalink / raw)
  To: Rik van Riel, Andrea Arcangeli, LKML

[-- Attachment #1: Type: text/plain, Size: 10870 bytes --]

   Hi,

   This is the third time I've tried to report this problem, with no
response so far. One last try. If you're not interested, please tell
me and I won't bother you any more...

   I'm getting regular oopsen in kswapd on my 2.4.19 kernel. They
generally appear to happen while running Amanda (a tape backup
utility) -- although I've not identified exactly which component of
Amanda triggers it. The machine is lightly stressed with regard to
memory usage, although I suspect much of it is (currently) swapped out
(I'm running postgres and apache, but they don't get much use at the
moment):

hrm@vlad:hrm $ free
             total       used       free     shared    buffers     cached
Mem:        127240     125264       1976          0       2576      35020
-/+ buffers/cache:      87668      39572
Swap:       262132      53240     208892

   After the oops, my kswapd is zombied:

hrm@vlad:hrm $ ps ax | grep kswapd
    5 ?        Z      0:11 [kswapd <defunct>]

although the machine does appear to continue to function without
problems. I have seen precisely similar effects on most of the
previous 2.4.x kernels.

   Decoded oopsen are below (they _are_ decoded with the right system
maps, despite ksymoops's concerns). If there's anything else that's
needed in order to track this down, please let me know.

   Thanks,
   Hugo.

-----

ksymoops 2.4.6 on i586 2.4.19.  Options used
     -V (default)
     -k /proc/ksyms (default)
     -l /proc/modules (default)
     -o /lib/modules/2.4.19/ (default)
     -m /boot/System.map-2.4.19 (default)

Warning: You did not tell me where to find symbol information.  I will
assume that the log matches the kernel and modules that are running
right now and I'll use the default options above for symbol resolution.
If the current kernel and/or modules do not match the log, you can get
more accurate output by telling me the kernel version and where to find
map, modules, ksyms etc.  ksymoops -h explains the options.

Oct 24 06:31:14 vlad kernel: c014248a
Oct 24 06:31:14 vlad kernel: Oops: 0000
Oct 24 06:31:14 vlad kernel: CPU:    0
Oct 24 06:31:14 vlad kernel: EIP:    0010:[iput+46/432]    Not tainted
Oct 24 06:31:14 vlad kernel: EFLAGS: 00010206
Oct 24 06:31:14 vlad kernel: eax: 00000000   ebx: c67d8800   ecx: c67d8810   edx: c67d8810
Oct 24 06:31:14 vlad kernel: esi: 476f7200   edi: 00000000   ebp: c7f9ff3c   esp: c7f9ff30
Oct 24 06:31:14 vlad kernel: ds: 0018   es: 0018   ss: 0018
Oct 24 06:31:14 vlad kernel: Process kswapd (pid: 5, stackpage=c7f9f000)
Oct 24 06:31:14 vlad kernel: Stack: c088ef78 c088ef60 c67d8800 c7f9ff54 c01405e6 c67d8800 00000011 000001d0 
Oct 24 06:31:14 vlad kernel:        00000011 c7f9ff60 c01408bc 0000172d c7f9ff84 c012b0b1 00000002 000001d0 
Oct 24 06:31:14 vlad kernel:        00000002 000001d0 c0287d74 00000002 c0287d74 c7f9ff9c c012b101 00000011 
Oct 24 06:31:14 vlad kernel: Call Trace:    [prune_dcache+198/316] [shrink_dcache_memory+28/52] [shrink_caches+105/132] [try_to_free_pages+53/88] [kswapd_balance_pgdat+76/160]
Oct 24 06:31:14 vlad kernel: Code: 8b 46 20 85 c0 74 02 89 c7 85 ff 74 0d 8b 47 10 85 c0 74 06 
Using defaults from ksymoops -t elf32-i386 -a i386


>>ebx; c67d8800 <_end+64d5fc8/852f828>
>>ecx; c67d8810 <_end+64d5fd8/852f828>
>>edx; c67d8810 <_end+64d5fd8/852f828>
>>ebp; c7f9ff3c <_end+7c9d704/852f828>
>>esp; c7f9ff30 <_end+7c9d6f8/852f828>

Code;  00000000 Before first symbol
00000000 <_EIP>:
Code;  00000000 Before first symbol
   0:   8b 46 20                  mov    0x20(%esi),%eax
Code;  00000003 Before first symbol
   3:   85 c0                     test   %eax,%eax
Code;  00000005 Before first symbol
   5:   74 02                     je     9 <_EIP+0x9>
Code;  00000007 Before first symbol
   7:   89 c7                     mov    %eax,%edi
Code;  00000009 Before first symbol
   9:   85 ff                     test   %edi,%edi
Code;  0000000b Before first symbol
   b:   74 0d                     je     1a <_EIP+0x1a>
Code;  0000000d Before first symbol
   d:   8b 47 10                  mov    0x10(%edi),%eax
Code;  00000010 Before first symbol
  10:   85 c0                     test   %eax,%eax
Code;  00000012 Before first symbol
  12:   74 06                     je     1a <_EIP+0x1a>


1 warning issued.  Results may not be reliable.

-----

ksymoops 2.4.6 on i586 2.4.19.  Options used
     -V (default)
     -k /proc/ksyms (default)
     -l /proc/modules (default)
     -o /lib/modules/2.4.19/ (default)
     -m /boot/System.map-2.4.19 (default)

Warning: You did not tell me where to find symbol information.  I will
assume that the log matches the kernel and modules that are running
right now and I'll use the default options above for symbol resolution.
If the current kernel and/or modules do not match the log, you can get
more accurate output by telling me the kernel version and where to find
map, modules, ksyms etc.  ksymoops -h explains the options.

Oct 27 01:46:00 vlad kernel: Unable to handle kernel paging request at virtual  address 47804220
Oct 27 01:46:00 vlad kernel: c014248a
Oct 27 01:46:00 vlad kernel: *pde = 00000000
Oct 27 01:46:00 vlad kernel: Oops: 0000
Oct 27 01:46:00 vlad kernel: CPU:    0
Oct 27 01:46:00 vlad kernel: EIP:    0010:[iput+46/432]    Not tainted
Oct 27 01:46:00 vlad kernel: EFLAGS: 00010206
Oct 27 01:46:00 vlad kernel: eax: 00000000   ebx: c67c8800   ecx: c67c8810   edx: c67c8810
Oct 27 01:46:00 vlad kernel: esi: 47804200   edi: 00000000   ebp: c7f9ff3c   esp: c7f9ff30
Oct 27 01:46:00 vlad kernel: ds: 0018   es: 0018   ss: 0018
Oct 27 01:46:00 vlad kernel: Process kswapd (pid: 5, stackpage=c7f9f000)
Oct 27 01:46:00 vlad kernel: Stack: c6a93d38 c6a93d20 c67c8800 c7f9ff54 c01405e6c67c8800 00000005 000001d0
Oct 27 01:46:00 vlad kernel:        00000020 c7f9ff60 c01408bc 000009e1 c7f9ff84c012b0b1 00000006 000001d0
Oct 27 01:46:00 vlad kernel:        00000006 000001d0 c0287d74 00000006 c0287d74c7f9ff9c c012b101 00000020
Oct 27 01:46:00 vlad kernel: Call Trace:    [prune_dcache+198/316] [shrink_dcache_memory+28/52] [shrink_caches+105/132] [try_to_free_pages+53/88] [kswapd_balance_pgdat+76/160]
Oct 27 01:46:00 vlad kernel: Code: 8b 46 20 85 c0 74 02 89 c7 85 ff 74 0d 8b 47 10 85 c0 74 06
Using defaults from ksymoops -t elf32-i386 -a i386


>>ebx; c67c8800 <_end+64c5fc8/852f828>
>>ecx; c67c8810 <_end+64c5fd8/852f828>
>>edx; c67c8810 <_end+64c5fd8/852f828>
>>ebp; c7f9ff3c <_end+7c9d704/852f828>
>>esp; c7f9ff30 <_end+7c9d6f8/852f828>

Code;  00000000 Before first symbol
00000000 <_EIP>:
Code;  00000000 Before first symbol
   0:   8b 46 20                  mov    0x20(%esi),%eax
Code;  00000003 Before first symbol
   3:   85 c0                     test   %eax,%eax
Code;  00000005 Before first symbol
   5:   74 02                     je     9 <_EIP+0x9>
Code;  00000007 Before first symbol
   7:   89 c7                     mov    %eax,%edi
Code;  00000009 Before first symbol
   9:   85 ff                     test   %edi,%edi
Code;  0000000b Before first symbol
   b:   74 0d                     je     1a <_EIP+0x1a>
Code;  0000000d Before first symbol
   d:   8b 47 10                  mov    0x10(%edi),%eax
Code;  00000010 Before first symbol
  10:   85 c0                     test   %eax,%eax
Code;  00000012 Before first symbol
  12:   74 06                     je     1a <_EIP+0x1a>


1 warning issued.  Results may not be reliable.

-----

ksymoops 2.4.6 on i586 2.4.19.  Options used
     -V (default)
     -k /proc/ksyms (default)
     -l /proc/modules (default)
     -o /lib/modules/2.4.19/ (default)
     -m /boot/System.map-2.4.19 (default)

Warning: You did not tell me where to find symbol information.  I will
assume that the log matches the kernel and modules that are running
right now and I'll use the default options above for symbol resolution.
If the current kernel and/or modules do not match the log, you can get
more accurate output by telling me the kernel version and where to find
map, modules, ksyms etc.  ksymoops -h explains the options.

Oct 28 08:04:49 vlad kernel: Unable to handle kernel paging request at virtual address 47880220
Oct 28 08:04:49 vlad kernel: c0141c6a
Oct 28 08:04:49 vlad kernel: *pde = 00000000
Oct 28 08:04:49 vlad kernel: Oops: 0000
Oct 28 08:04:49 vlad kernel: CPU:    0
Oct 28 08:04:49 vlad kernel: EIP:    0010:[clear_inode+86/168]    Not tainted
Oct 28 08:04:49 vlad kernel: EFLAGS: 00010206
Oct 28 08:04:49 vlad kernel: eax: 47880200   ebx: c67c8800   ecx: c67c8808   edx: c67c8818
Oct 28 08:04:49 vlad kernel: esi: c7f9ff44   edi: c73e8a28   ebp: c7f9ff14   esp: c7f9ff10
Oct 28 08:04:49 vlad kernel: ds: 0018   es: 0018   ss: 0018
Oct 28 08:04:49 vlad kernel: Process kswapd (pid: 5, stackpage=c7f9f000)
Oct 28 08:04:49 vlad kernel: Stack: c67c8800 c7f9ff28 c0141cff c67c8800 c5829648 c5829640 c7f9ff4c c0141f24 
Oct 28 08:04:49 vlad kernel:        c7f9ff44 0000000c 000001d0 00000020 000005df c02d3428 c36d5de8 c7f9ff58 
Oct 28 08:04:49 vlad kernel:        c0141f5c 00000000 c7f9ff84 c012b0bb 00000006 000001d0 00000006 000001d0 
Oct 28 08:04:49 vlad kernel: Call Trace:    [dispose_list+67/96] [prune_icache+164/192] [shrink_icache_memory+28/52] [shrink_caches+115/132] [try_to_free_pages+53/88]
Oct 28 08:04:49 vlad kernel: Code: 8b 40 20 85 c0 74 0f 8b 40 30 85 c0 74 08 53 ff d0 83 c4 04 
Using defaults from ksymoops -t elf32-i386 -a i386


>>ebx; c67c8800 <_end+64c5fc8/852f828>
>>ecx; c67c8808 <_end+64c5fd0/852f828>
>>edx; c67c8818 <_end+64c5fe0/852f828>
>>esi; c7f9ff44 <_end+7c9d70c/852f828>
>>edi; c73e8a28 <_end+70e61f0/852f828>
>>ebp; c7f9ff14 <_end+7c9d6dc/852f828>
>>esp; c7f9ff10 <_end+7c9d6d8/852f828>

Code;  00000000 Before first symbol
00000000 <_EIP>:
Code;  00000000 Before first symbol
   0:   8b 40 20                  mov    0x20(%eax),%eax
Code;  00000003 Before first symbol
   3:   85 c0                     test   %eax,%eax
Code;  00000005 Before first symbol
   5:   74 0f                     je     16 <_EIP+0x16>
Code;  00000007 Before first symbol
   7:   8b 40 30                  mov    0x30(%eax),%eax
Code;  0000000a Before first symbol
   a:   85 c0                     test   %eax,%eax
Code;  0000000c Before first symbol
   c:   74 08                     je     16 <_EIP+0x16>
Code;  0000000e Before first symbol
   e:   53                        push   %ebx
Code;  0000000f Before first symbol
   f:   ff d0                     call   *%eax
Code;  00000011 Before first symbol
  11:   83 c4 04                  add    $0x4,%esp


1 warning issued.  Results may not be reliable.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
 PGP: 1024D/1C335860 from wwwkeys.eu.pgp.net or www.carfax.nildram.co.uk
   --- Anyone who claims their cryptographic protocol is secure is ---   
         either a genius or a fool.  Given the genius/fool ratio         
                 for our species,  the odds aren't good.                 

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Oops in kswapd, 2.4.19 kernel and before
  2002-10-28 10:24 Oops in kswapd, 2.4.19 kernel and before Hugo Mills
@ 2002-10-28 10:47 ` Morten Helgesen
  2002-10-28 12:29 ` Andrea Arcangeli
  1 sibling, 0 replies; 5+ messages in thread
From: Morten Helgesen @ 2002-10-28 10:47 UTC (permalink / raw)
  To: Hugo Mills; +Cc: linux-kernel

Hey Hugo, 

On Mon, Oct 28, 2002 at 10:24:39AM +0000, Hugo Mills wrote:
>    Hi,
> 
>    This is the third time I've tried to report this problem, with no
> response so far. One last try. If you're not interested, please tell
> me and I won't bother you any more...
> 
>    I'm getting regular oopsen in kswapd on my 2.4.19 kernel. They
> generally appear to happen while running Amanda (a tape backup
> utility) -- although I've not identified exactly which component of
> Amanda triggers it. The machine is lightly stressed with regard to
> memory usage, although I suspect much of it is (currently) swapped out
> (I'm running postgres and apache, but they don't get much use at the
> moment):

[snip]

I think this is the same issue I reported here : 
http://marc.theaimsgroup.com/?l=linux-kernel&m=103226236223247&w=2

Upgrading to 2.4.20-pre7 has solved the problem for me ... Haven`t
had time to look into what actually caused/fixed it. 

== Morten

-- 

"Livet er ikke for nybegynnere" - sitat fra en klok person.

mvh
Morten Helgesen 
UNIX System Administrator & C Developer 
Nextframe AS
admin@nextframe.net / 93445641
http://www.nextframe.net

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Oops in kswapd, 2.4.19 kernel and before
  2002-10-28 10:24 Oops in kswapd, 2.4.19 kernel and before Hugo Mills
  2002-10-28 10:47 ` Morten Helgesen
@ 2002-10-28 12:29 ` Andrea Arcangeli
  2002-10-28 16:45   ` Hugo Mills
  1 sibling, 1 reply; 5+ messages in thread
From: Andrea Arcangeli @ 2002-10-28 12:29 UTC (permalink / raw)
  To: Hugo Mills, Rik van Riel, LKML

On Mon, Oct 28, 2002 at 10:24:39AM +0000, Hugo Mills wrote:
>    Hi,
> 
>    This is the third time I've tried to report this problem, with no
> response so far. One last try. If you're not interested, please tell
> me and I won't bother you any more...
> 
>    I'm getting regular oopsen in kswapd on my 2.4.19 kernel. They
> generally appear to happen while running Amanda (a tape backup

if it only happens while or after running Amanda, it may be a tape
driver bug.

>    Decoded oopsen are below (they _are_ decoded with the right system
> maps, despite ksymoops's concerns). If there's anything else that's
> needed in order to track this down, please let me know.

the oopses shows some inode was corrupted, it doesn't tell us who is
corrupting them but most likely it is not a piece of common code (a driver
or a non mainstream feature or we should be able to reproduce it) You
should try to localize the bug to a piece of code, by for example making
100% sure that it triggers as soon as you start amanda. Then you can try
to backup using another device (not tape) and see if you can still
reproduce. finally you can try to use older or newer 2.4 drivers for the
tape and see if there's any change that fixes the problem in the old/new
drivers. Of course it isn't certain at all that it is the tape, I'm just
guessing because you said it happens while backing up to the tape.

Andrea

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Oops in kswapd, 2.4.19 kernel and before
  2002-10-28 12:29 ` Andrea Arcangeli
@ 2002-10-28 16:45   ` Hugo Mills
  2002-10-28 19:10     ` Andrea Arcangeli
  0 siblings, 1 reply; 5+ messages in thread
From: Hugo Mills @ 2002-10-28 16:45 UTC (permalink / raw)
  To: Andrea Arcangeli; +Cc: linux-kernel

[-- Attachment #1: Type: text/plain, Size: 2399 bytes --]

On Mon, Oct 28, 2002 at 01:29:01PM +0100, Andrea Arcangeli wrote:
> On Mon, Oct 28, 2002 at 10:24:39AM +0000, Hugo Mills wrote:
> >    I'm getting regular oopsen in kswapd on my 2.4.19 kernel. They
> > generally appear to happen while running Amanda (a tape backup
> 
> if it only happens while or after running Amanda, it may be a tape
> driver bug.

   I may have seen it (once?) before without touching the tape drive,
although I'm not certain. I shall see if I can reproduce without use
of the tape.

> >    Decoded oopsen are below (they _are_ decoded with the right system
> > maps, despite ksymoops's concerns). If there's anything else that's
> > needed in order to track this down, please let me know.
> 
> the oopses shows some inode was corrupted, it doesn't tell us who is
> corrupting them but most likely it is not a piece of common code (a driver
> or a non mainstream feature or we should be able to reproduce it) You
> should try to localize the bug to a piece of code, by for example making
> 100% sure that it triggers as soon as you start amanda. 

   It's not certain. I appear to have triggered it this morning on the
_third_ consecutive run of amflush. Again, I'll test more carefully.

> Then you can try to backup using another device (not tape) and see
> if you can still reproduce. finally you can try to use older or
> newer 2.4 drivers for the tape and see if there's any change that
> fixes the problem in the old/new drivers. Of course it isn't certain
> at all that it is the tape, I'm just guessing because you said it
> happens while backing up to the tape.

   I've definitely seen the problem throughout the 2.4 series. I don't
recall what the first 2.4 kernel I used was, but it was definitely
there in all mainstream kernels (and those -ac kernels I tried) from
about 2.4.14 onwards. I'll try 2.4.20-preX and report on that as well.

   Thanks for your help. It may be a week or two before I can get all
these tests completed, but I shall definitely report back when I'm
done.

   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
 PGP: 1024D/1C335860 from wwwkeys.eu.pgp.net or www.carfax.nildram.co.uk
   --- Anyone who claims their cryptographic protocol is secure is ---   
         either a genius or a fool.  Given the genius/fool ratio         
                 for our species,  the odds aren't good.                 

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Oops in kswapd, 2.4.19 kernel and before
  2002-10-28 16:45   ` Hugo Mills
@ 2002-10-28 19:10     ` Andrea Arcangeli
  0 siblings, 0 replies; 5+ messages in thread
From: Andrea Arcangeli @ 2002-10-28 19:10 UTC (permalink / raw)
  To: Hugo Mills, linux-kernel

On Mon, Oct 28, 2002 at 04:45:40PM +0000, Hugo Mills wrote:
> On Mon, Oct 28, 2002 at 01:29:01PM +0100, Andrea Arcangeli wrote:
> > On Mon, Oct 28, 2002 at 10:24:39AM +0000, Hugo Mills wrote:
> > >    I'm getting regular oopsen in kswapd on my 2.4.19 kernel. They
> > > generally appear to happen while running Amanda (a tape backup
> > 
> > if it only happens while or after running Amanda, it may be a tape
> > driver bug.
> 
>    I may have seen it (once?) before without touching the tape drive,
> although I'm not certain. I shall see if I can reproduce without use
> of the tape.

perfect, thanks.

> 
> > >    Decoded oopsen are below (they _are_ decoded with the right system
> > > maps, despite ksymoops's concerns). If there's anything else that's
> > > needed in order to track this down, please let me know.
> > 
> > the oopses shows some inode was corrupted, it doesn't tell us who is
> > corrupting them but most likely it is not a piece of common code (a driver
> > or a non mainstream feature or we should be able to reproduce it) You
> > should try to localize the bug to a piece of code, by for example making
> > 100% sure that it triggers as soon as you start amanda. 
> 
>    It's not certain. I appear to have triggered it this morning on the
> _third_ consecutive run of amflush. Again, I'll test more carefully.

You may want to start a very intensive kernel stress test right after
doing the backup. If it corrupts memory, you won't notice until you
actually use the corrupted memory. Other times it may corrupt user or
free memory and in such cases you won't get an oops.

> > Then you can try to backup using another device (not tape) and see
> > if you can still reproduce. finally you can try to use older or
> > newer 2.4 drivers for the tape and see if there's any change that
> > fixes the problem in the old/new drivers. Of course it isn't certain
> > at all that it is the tape, I'm just guessing because you said it
> > happens while backing up to the tape.
> 
>    I've definitely seen the problem throughout the 2.4 series. I don't
> recall what the first 2.4 kernel I used was, but it was definitely
> there in all mainstream kernels (and those -ac kernels I tried) from
> about 2.4.14 onwards. I'll try 2.4.20-preX and report on that as well.
> 
>    Thanks for your help. It may be a week or two before I can get all
> these tests completed, but I shall definitely report back when I'm
> done.

Ok.

Andrea

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2002-10-28 19:04 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2002-10-28 10:24 Oops in kswapd, 2.4.19 kernel and before Hugo Mills
2002-10-28 10:47 ` Morten Helgesen
2002-10-28 12:29 ` Andrea Arcangeli
2002-10-28 16:45   ` Hugo Mills
2002-10-28 19:10     ` Andrea Arcangeli

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox