public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* 2.4.24 Paging Fault, Cache tries to swap with no swap partition
@ 2004-02-14 17:06 Ross Dickson
  2004-02-14 20:33 ` vda
  0 siblings, 1 reply; 5+ messages in thread
From: Ross Dickson @ 2004-02-14 17:06 UTC (permalink / raw)
  To: linux-kernel

Greetings,

I have an imaging system writing files to removable hard drives.
Compact Flash boot with ram drives so I usually have no swap partition or file.

Recently I upgraded kernel from 2.4.20 to 2.4.24.

System has "mem=460M" (512M ram fitted) and starts with about
400M free. After recording for a while the Cached ram acquires all
but about 4Mb MemFree.

On a hot 38C day it started Oops'ing re paging memory. It runs the
same 2 programs all day gathering and compressing images.
Sorry I have no detail on the Oops at the moment, computer is in a vehicle and
does not normally have a screen. From memory it couldn't allocate a virtual 
page.

I found if I put in a 16Mb ram drive as swap then it would grab
roughly 1.4Mb of it on occasion and keep it until recording stopped
for a while. SwapCached is either 0Kb or 1024Kb, not anything else.

Is this behaviour expected - to require a swap file? 
Can the paging cache be tuned in /proc or somewhere to prevent it being so 
greedy as to want more memory than the machine has?

Is the quickest fix to give it more ram. I read on another posting that with
greater than 512Mb the cache won't grab any more?

Regards
Ross.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: 2.4.24 Paging Fault, Cache tries to swap with no swap partition
  2004-02-14 17:06 2.4.24 Paging Fault, Cache tries to swap with no swap partition Ross Dickson
@ 2004-02-14 20:33 ` vda
  2004-02-15 13:49   ` Ross Dickson
  0 siblings, 1 reply; 5+ messages in thread
From: vda @ 2004-02-14 20:33 UTC (permalink / raw)
  To: ross, linux-kernel

On Saturday 14 February 2004 19:06, Ross Dickson wrote:
> I have an imaging system writing files to removable hard drives.
> Compact Flash boot with ram drives so I usually have no swap partition or
> file.
>
> Recently I upgraded kernel from 2.4.20 to 2.4.24.
>
> System has "mem=460M" (512M ram fitted) and starts with about
> 400M free. After recording for a while the Cached ram acquires all
> but about 4Mb MemFree.
>
> On a hot 38C day it started Oops'ing re paging memory. It runs the

Too vague.
Do you have any logging? At least a circular buffer? Anything?

> same 2 programs all day gathering and compressing images.
> Sorry I have no detail on the Oops at the moment, computer is in a vehicle
> and does not normally have a screen. From memory it couldn't allocate a
> virtual page.
>
> I found if I put in a 16Mb ram drive as swap then it would grab
> roughly 1.4Mb of it on occasion and keep it until recording stopped
> for a while. SwapCached is either 0Kb or 1024Kb, not anything else.

If swap is active, some of it may be used even when box is not
heavily loaded. That's normal.

> Is this behaviour expected - to require a swap file?

No.

> Can the paging cache be tuned in /proc or somewhere to prevent it being so
> greedy as to want more memory than the machine has?

Maybe. But you should concentrate on finding where exactly it oopsed.

> Is the quickest fix to give it more ram. I read on another posting that
> with greater than 512Mb the cache won't grab any more?

Please don't succumb to 'add more RAM' syndrome. 460 megs should be fine
for you. I'd say better find the root of the problem.
--
vda


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: 2.4.24 Paging Fault, Cache tries to swap with no swap partition
  2004-02-14 20:33 ` vda
@ 2004-02-15 13:49   ` Ross Dickson
  2004-02-16 10:25     ` Ross Dickson
  0 siblings, 1 reply; 5+ messages in thread
From: Ross Dickson @ 2004-02-15 13:49 UTC (permalink / raw)
  To: linux-kernel; +Cc: vda

On Sunday 15 February 2004 06:33, you wrote:
> On Saturday 14 February 2004 19:06, Ross Dickson wrote:
> > I have an imaging system writing files to removable hard drives.
> > Compact Flash boot with ram drives so I usually have no swap partition or
> > file.
> >
> > Recently I upgraded kernel from 2.4.20 to 2.4.24.
> >
> > System has "mem=460M" (512M ram fitted) and starts with about
> > 400M free. After recording for a while the Cached ram acquires all
> > but about 4Mb MemFree.
> >
> > On a hot 38C day it started Oops'ing re paging memory. It runs the
> 
> Too vague.
> Do you have any logging? At least a circular buffer? Anything?

Unfortunately not much, I was hot, tired, not at my best - should have
grabbed it when I had the chance. All I grabbed was a partial code string
at the bottom of the  Oops which I doubt is of any benefit without the rest.
8b 5f 04 8d 77 08 83 eb 18 8b
Oh yeah, it killed init too. 
System defaults to not logging to permanent storage, flash would die over
time from writes and info would be meaningless to customer on their removable
hard drive. Note to self - must change that - I'm sure customer could spare some
space. 

I assumed I could reproduce the fault today if but it wouldn't fault.
First misbehaved Friday 13th, reproduced Sat 14th.
Self cured??? Sunday 15th. Doh!!!

> 
> > same 2 programs all day gathering and compressing images.
> > Sorry I have no detail on the Oops at the moment, computer is in a vehicle
> > and does not normally have a screen. From memory it couldn't allocate a
> > virtual page.
> >
> > I found if I put in a 16Mb ram drive as swap then it would grab
> > roughly 1.4Mb of it on occasion and keep it until recording stopped
> > for a while. SwapCached is either 0Kb or 1024Kb, not anything else.
> 
> If swap is active, some of it may be used even when box is not
> heavily loaded. That's normal.
> 
> > Is this behaviour expected - to require a swap file?
> 
> No.
> 
> > Can the paging cache be tuned in /proc or somewhere to prevent it being so
> > greedy as to want more memory than the machine has?
> 
> Maybe. But you should concentrate on finding where exactly it oopsed.

I note memfree stabilises at around 4Mb when running OK, given it only wanted
an extra 1Mb cache swap, can I cat something to /proc/sys/vm/????? to force
it to stabilise at around 10Mb or 20Mb? Otherwise can I change a constant
and recompile kernel to achieve same? It might help give more headroom when
the event occurs.

> 
> > Is the quickest fix to give it more ram. I read on another posting that
> > with greater than 512Mb the cache won't grab any more?
> 
> Please don't succumb to 'add more RAM' syndrome. 460 megs should be fine
> for you. I'd say better find the root of the problem.

I admit it, I since tried the 'add more RAM' but a couple of capture card devices
did not like more than about 800Mb so I pulled the stick back out. It ran quite
well in 256Mb with the old kernel for about a year so it is puzzling.

> vda
> 

Thanks for the response
Regards
Ross.


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: 2.4.24 Paging Fault, Cache tries to swap with no swap partition
  2004-02-15 13:49   ` Ross Dickson
@ 2004-02-16 10:25     ` Ross Dickson
  2004-02-16 13:45       ` 2.4.24 Paging Fault, Source Line located in slab.c, kmem_cache_reap() Ross Dickson
  0 siblings, 1 reply; 5+ messages in thread
From: Ross Dickson @ 2004-02-16 10:25 UTC (permalink / raw)
  To: linux-kernel; +Cc: vda

On Sunday 15 February 2004 23:49, Ross Dickson wrote:
> On Sunday 15 February 2004 06:33, you wrote:
> > On Saturday 14 February 2004 19:06, Ross Dickson wrote:
> > > I have an imaging system writing files to removable hard drives.
> > > Compact Flash boot with ram drives so I usually have no swap partition or
> > > file.
> > >
> > > Recently I upgraded kernel from 2.4.20 to 2.4.24.
Is KM18G Pro (nforce2 dual memory mode), AMD 2400XP, Preempt, Low latency,
64Bit jiffies 1000Hz patched.

I found some articles about memory overcommitment, checked the source and saw
strict in use for arm systems - no swap- so this time I thought I would try 

echo 1 > /proc/sys/vm/overcommit_memory

I got another oops under equivalent circumstances to earlier (no swap).  
I ran oops through ksymoops on another machine with same kernel , results below.
At this point I think the trigger may be a slow (bad blocks?) 80Gb hard drive the files
are being written to. The PCI bus is quite busy with imaging from 3 cameras on two
capture cards (bttv and meteor II mc).

> > > Can the paging cache be tuned in /proc or somewhere to prevent it being so
> > > greedy as to want more memory than the machine has?
> > 
> > Maybe. But you should concentrate on finding where exactly it oopsed.
...snip...
> ......I added a 16mb ram drive swap (see earlier posting)
> I note memfree stabilises at around 4Mb when running OK, given it only wanted
> an extra 1Mb cache swap, can I cat something to /proc/sys/vm/????? to force
> it to stabilise at around 10Mb or 20Mb? Otherwise can I change a constant
> and recompile kernel to achieve same? It might help give more headroom when
> the event occurs.
> 
> > 
> > > Is the quickest fix to give it more ram. I read on another posting that
> > > with greater than 512Mb the cache won't grab any more?
> > 
> > Please don't succumb to 'add more RAM' syndrome. 460 megs should be fine
> > for you. I'd say better find the root of the problem.
...snip...
> > vda
> > 

Thanks for the response
Regards
Ross.

Was run "mem=450" with this Oops

ksymoops 2.4.8 on i686 2.4.24-rd.  Options used
     -V (default)
     -k /proc/ksyms (default)
     -L (specified)
     -o /lib/modules/2.4.24-rd/ (default)
     -m /boot/System.map (specified)

Unable to handle kernel paging request at virtual address 6a65656a
c0133f20
*pde = 00000000
Oops: 0000
CPU:    0
EIP:    0010:[<c0133f20>]    Not tainted
Using defaults from ksymoops -t elf32-i386 -a i386
EFLAGS: 00010883
eax: 6a65656a   ebx: 000000f5   ecx: c14dfdd4   edx: c14dfde4
esi: 00000000   edi: 00000008   ebp: c14dfe40   esp: dc16df38
ds: 0018   es: 0018   ss: 0018
Process kswapd (pid: 4, stackpage=dc16d000)
Stack: 000001d0 00000001 c14dfdd4 00000000 00000005 00000005 00000020 000001d0 
       c032c4d8 c032c4d8 c0134ebc dc16df84 000001d0 0000003c 00000020 c0134f58 
       dc16df84 dc16c000 00000000 00000000 c032c4d8 dc16c000 c032c400 00000000 
Call Trace:    [<c0134ebc>] [<c0134f58>] [<c01350f6>] [<c0135169>] [<c013529d>]
  [<c0135210>] [<c0105000>] [<c01057db>] [<c0135210>]
Code: 8b 00 43 39 d0 75 f9 8b 44 24 08 89 da 8b 70 24 8b 40 44 89 


>>EIP; c0133f20 <kmem_cache_reap+80/1f0>   <=====

>>ecx; c14dfdd4 <_end+1115228/1e4db4b4>
>>edx; c14dfde4 <_end+1115238/1e4db4b4>
>>ebp; c14dfe40 <_end+1115294/1e4db4b4>
>>esp; dc16df38 <_end+1bda338c/1e4db4b4>

Trace; c0134ebc <shrink_caches+1c/60>
Trace; c0134f58 <try_to_free_pages_zone+58/e0>
Trace; c01350f6 <kswapd_balance_pgdat+56/b0>
Trace; c0135169 <kswapd_balance+19/30>
Trace; c013529d <kswapd+8d/b0>
Trace; c0135210 <kswapd+0/b0>
Trace; c0105000 <_stext+0/0>
Trace; c01057db <arch_kernel_thread+2b/40>
Trace; c0135210 <kswapd+0/b0>

Code;  c0133f20 <kmem_cache_reap+80/1f0>
00000000 <_EIP>:
Code;  c0133f20 <kmem_cache_reap+80/1f0>   <=====
   0:   8b 00                     mov    (%eax),%eax   <=====
Code;  c0133f22 <kmem_cache_reap+82/1f0>
   2:   43                        inc    %ebx
Code;  c0133f23 <kmem_cache_reap+83/1f0>
   3:   39 d0                     cmp    %edx,%eax
Code;  c0133f25 <kmem_cache_reap+85/1f0>
   5:   75 f9                     jne    0 <_EIP>
Code;  c0133f27 <kmem_cache_reap+87/1f0>
   7:   8b 44 24 08               mov    0x8(%esp,1),%eax
Code;  c0133f2b <kmem_cache_reap+8b/1f0>
   b:   89 da                     mov    %ebx,%edx
Code;  c0133f2d <kmem_cache_reap+8d/1f0>
   d:   8b 70 24                  mov    0x24(%eax),%esi
Code;  c0133f30 <kmem_cache_reap+90/1f0>
  10:   8b 40 44                  mov    0x44(%eax),%eax
Code;  c0133f33 <kmem_cache_reap+93/1f0>
  13:   89 00                     mov    %eax,(%eax)

Mem starts like this when programs have been started.
Was run "mem=460" for these mem readings.
        total:    used:    free:  shared: buffers:  cached:
Mem:  473899008 22151168 451747840        0   327680  7737344
Swap:        0        0        0
MemTotal:       462792 kB
MemFree:        441160 kB
MemShared:           0 kB
Buffers:           320 kB
Cached:           7556 kB
SwapCached:          0 kB
Active:           2320 kB
Inactive:         7072 kB
HighTotal:           0 kB
HighFree:            0 kB
LowTotal:       462792 kB
LowFree:        441160 kB
SwapTotal:           0 kB
SwapFree:            0 kB

And is like this near to Oops time
        total:    used:    free:  shared: buffers:  cached:
Mem:  473899008 469348352  4550656        0   831488 420454400
Swap:        0        0        0
MemTotal:       462792 kB
MemFree:          4444 kB
MemShared:           0 kB
Buffers:           812 kB
Cached:         410600 kB
SwapCached:          0 kB
Active:           5064 kB
Inactive:       410968 kB
HighTotal:           0 kB
HighFree:            0 kB
LowTotal:       462792 kB
LowFree:          4444 kB
SwapTotal:           0 kB
SwapFree:            0 kB


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: 2.4.24 Paging Fault, Source Line located in slab.c, kmem_cache_reap()
  2004-02-16 10:25     ` Ross Dickson
@ 2004-02-16 13:45       ` Ross Dickson
  0 siblings, 0 replies; 5+ messages in thread
From: Ross Dickson @ 2004-02-16 13:45 UTC (permalink / raw)
  To: linux-kernel; +Cc: vda

Tracked down Oops to source line in kmem_cache_reap.
...............................................................
This Line is number 1784 in my file.........................
		full_free = 0;
		p = searchp->slabs_free.next;
		while (p != &searchp->slabs_free) {  
#if DEBUG
			slabp = list_entry(p, slab_t, list);

			if (slabp->inuse)
				BUG();
#endif
			full_free++;
			p = p->next; <======Oops Here
		}
............................
objdump for above code section:
Oops identified offending instruction at kmem_cache_reap+80 which is af0
............................
add:       8b 4c 24 08             mov    0x8(%esp,1),%ecx
     ae1:       31 db                   xor    %ebx,%ebx
     ae3:       8b 41 10                mov    0x10(%ecx),%eax
     ae6:       89 ca                   mov    %ecx,%edx
     ae8:       83 c2 10                add    $0x10,%edx
     aeb:       39 d0                   cmp    %edx,%eax
     aed:       74 08                   je     af7 <kmem_cache_reap+0x87>
     aef:       90                      nop
     af0:       8b 00                   mov    (%eax),%eax {<=====Oops}
     af2:       43                      inc    %ebx
     af3:       39 d0                   cmp    %edx,%eax
     af5:       75 f9                   jne    af0 <kmem_cache_reap+0x80>
     af7:       8b 44 24 08             mov    0x8(%esp,1),%eax
...............................
I do not know how this part of the kernel works.
Have we walked off the end of a list or something?
Can anyone help with a theory or still better a fix? 

Hopefully thanks in advance,
Ross.


On Monday 16 February 2004 20:25, Ross Dickson wrote:
> On Sunday 15 February 2004 23:49, Ross Dickson wrote:
> > On Sunday 15 February 2004 06:33, you wrote:
> > > On Saturday 14 February 2004 19:06, Ross Dickson wrote:
> > > > I have an imaging system writing files to removable hard drives.
> > > > Compact Flash boot with ram drives so I usually have no swap partition or
> > > > file.
> > > >
> > > > Recently I upgraded kernel from 2.4.20 to 2.4.24.
> Is KM18G Pro (nforce2 dual memory mode), AMD 2400XP, Preempt, Low latency,
> 64Bit jiffies 1000Hz patched.
> 
> I found some articles about memory overcommitment, checked the source and saw
> strict in use for arm systems - no swap- so this time I thought I would try 
> 
> echo 1 > /proc/sys/vm/overcommit_memory
> 
> I got another oops under equivalent circumstances to earlier (no swap).  
> I ran oops through ksymoops on another machine with same kernel , results below.
> At this point I think the trigger may be a slow (bad blocks?) 80Gb hard drive the files
> are being written to. The PCI bus is quite busy with imaging from 3 cameras on two
> capture cards (bttv and meteor II mc).
> 
> > > > Can the paging cache be tuned in /proc or somewhere to prevent it being so
> > > > greedy as to want more memory than the machine has?
> > > 
> > > Maybe. But you should concentrate on finding where exactly it oopsed.
> ...snip...
> > ......I added a 16mb ram drive swap (see earlier posting)
> > I note memfree stabilises at around 4Mb when running OK, given it only wanted
> > an extra 1Mb cache swap, can I cat something to /proc/sys/vm/????? to force
> > it to stabilise at around 10Mb or 20Mb? Otherwise can I change a constant
> > and recompile kernel to achieve same? It might help give more headroom when
> > the event occurs.
> > 
> > > 
> > > > Is the quickest fix to give it more ram. I read on another posting that
> > > > with greater than 512Mb the cache won't grab any more?
> > > 
> > > Please don't succumb to 'add more RAM' syndrome. 460 megs should be fine
> > > for you. I'd say better find the root of the problem.
> ...snip...
> > > vda
> > > 
> 
> Thanks for the response
> Regards
> Ross.
> 
> Was run "mem=450" with this Oops
> 
> ksymoops 2.4.8 on i686 2.4.24-rd.  Options used
>      -V (default)
>      -k /proc/ksyms (default)
>      -L (specified)
>      -o /lib/modules/2.4.24-rd/ (default)
>      -m /boot/System.map (specified)
> 
> Unable to handle kernel paging request at virtual address 6a65656a
> c0133f20
> *pde = 00000000
> Oops: 0000
> CPU:    0
> EIP:    0010:[<c0133f20>]    Not tainted
> Using defaults from ksymoops -t elf32-i386 -a i386
> EFLAGS: 00010883
> eax: 6a65656a   ebx: 000000f5   ecx: c14dfdd4   edx: c14dfde4
> esi: 00000000   edi: 00000008   ebp: c14dfe40   esp: dc16df38
> ds: 0018   es: 0018   ss: 0018
> Process kswapd (pid: 4, stackpage=dc16d000)
> Stack: 000001d0 00000001 c14dfdd4 00000000 00000005 00000005 00000020 000001d0 
>        c032c4d8 c032c4d8 c0134ebc dc16df84 000001d0 0000003c 00000020 c0134f58 
>        dc16df84 dc16c000 00000000 00000000 c032c4d8 dc16c000 c032c400 00000000 
> Call Trace:    [<c0134ebc>] [<c0134f58>] [<c01350f6>] [<c0135169>] [<c013529d>]
>   [<c0135210>] [<c0105000>] [<c01057db>] [<c0135210>]
> Code: 8b 00 43 39 d0 75 f9 8b 44 24 08 89 da 8b 70 24 8b 40 44 89 
> 
> 
> >>EIP; c0133f20 <kmem_cache_reap+80/1f0>   <=====
> 
> >>ecx; c14dfdd4 <_end+1115228/1e4db4b4>
> >>edx; c14dfde4 <_end+1115238/1e4db4b4>
> >>ebp; c14dfe40 <_end+1115294/1e4db4b4>
> >>esp; dc16df38 <_end+1bda338c/1e4db4b4>
> 
> Trace; c0134ebc <shrink_caches+1c/60>
> Trace; c0134f58 <try_to_free_pages_zone+58/e0>
> Trace; c01350f6 <kswapd_balance_pgdat+56/b0>
> Trace; c0135169 <kswapd_balance+19/30>
> Trace; c013529d <kswapd+8d/b0>
> Trace; c0135210 <kswapd+0/b0>
> Trace; c0105000 <_stext+0/0>
> Trace; c01057db <arch_kernel_thread+2b/40>
> Trace; c0135210 <kswapd+0/b0>
> 
> Code;  c0133f20 <kmem_cache_reap+80/1f0>
> 00000000 <_EIP>:
> Code;  c0133f20 <kmem_cache_reap+80/1f0>   <=====
>    0:   8b 00                     mov    (%eax),%eax   <=====
> Code;  c0133f22 <kmem_cache_reap+82/1f0>
>    2:   43                        inc    %ebx
> Code;  c0133f23 <kmem_cache_reap+83/1f0>
>    3:   39 d0                     cmp    %edx,%eax
> Code;  c0133f25 <kmem_cache_reap+85/1f0>
>    5:   75 f9                     jne    0 <_EIP>
> Code;  c0133f27 <kmem_cache_reap+87/1f0>
>    7:   8b 44 24 08               mov    0x8(%esp,1),%eax
> Code;  c0133f2b <kmem_cache_reap+8b/1f0>
>    b:   89 da                     mov    %ebx,%edx
> Code;  c0133f2d <kmem_cache_reap+8d/1f0>
>    d:   8b 70 24                  mov    0x24(%eax),%esi
> Code;  c0133f30 <kmem_cache_reap+90/1f0>
>   10:   8b 40 44                  mov    0x44(%eax),%eax
> Code;  c0133f33 <kmem_cache_reap+93/1f0>
>   13:   89 00                     mov    %eax,(%eax)
> 
> Mem starts like this when programs have been started.
> Was run "mem=460" for these mem readings.
>         total:    used:    free:  shared: buffers:  cached:
> Mem:  473899008 22151168 451747840        0   327680  7737344
> Swap:        0        0        0
> MemTotal:       462792 kB
> MemFree:        441160 kB
> MemShared:           0 kB
> Buffers:           320 kB
> Cached:           7556 kB
> SwapCached:          0 kB
> Active:           2320 kB
> Inactive:         7072 kB
> HighTotal:           0 kB
> HighFree:            0 kB
> LowTotal:       462792 kB
> LowFree:        441160 kB
> SwapTotal:           0 kB
> SwapFree:            0 kB
> 
> And is like this near to Oops time
>         total:    used:    free:  shared: buffers:  cached:
> Mem:  473899008 469348352  4550656        0   831488 420454400
> Swap:        0        0        0
> MemTotal:       462792 kB
> MemFree:          4444 kB
> MemShared:           0 kB
> Buffers:           812 kB
> Cached:         410600 kB
> SwapCached:          0 kB
> Active:           5064 kB
> Inactive:       410968 kB
> HighTotal:           0 kB
> HighFree:            0 kB
> LowTotal:       462792 kB
> LowFree:          4444 kB
> SwapTotal:           0 kB
> SwapFree:            0 kB
> 


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2004-02-16 13:45 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-02-14 17:06 2.4.24 Paging Fault, Cache tries to swap with no swap partition Ross Dickson
2004-02-14 20:33 ` vda
2004-02-15 13:49   ` Ross Dickson
2004-02-16 10:25     ` Ross Dickson
2004-02-16 13:45       ` 2.4.24 Paging Fault, Source Line located in slab.c, kmem_cache_reap() Ross Dickson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox