public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* 2.6.21 frozen for a few minutes, swapping to disk
@ 2007-04-29 10:28 Miguel Figueiredo
  2007-04-30  7:30 ` Andrew Morton
  0 siblings, 1 reply; 5+ messages in thread
From: Miguel Figueiredo @ 2007-04-29 10:28 UTC (permalink / raw)
  To: linux-kernel

Hi all,

today, with 2.6.21, my laptop had a really odd behaviour. It started 
writing to disk for a few minutes with no interactivity at all (no 
redraw on screen, only hdd led on). It's the first time i noticed 
OOM-killer started do kill programs.

It was totally unresponsive for minutes, after back to life it had a 
load of ~19.0, and 300+ MB on swap (first time i saw this).

It's an HP pavillon core duo 2.0 GHz, 1 GB RAM

kern.log details: 
http://www.debianpt.org/~elmig/pool/kernel/20070429/kern.log
.config: http://www.debianpt.org/~elmig/pool/kernel/20070429/2.6.21.config
dmesg: http://www.debianpt.org/~elmig/pool/kernel/20070429/dmesg

As this is the first time it happened and it felt odd i am reporting.

If aditional info is needed please CC me as i am not on the list.

-- 

Com os melhores cumprimentos/Best regards,

Miguel Figueiredo
http://www.DebianPT.org

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: 2.6.21 frozen for a few minutes, swapping to disk
  2007-04-29 10:28 2.6.21 frozen for a few minutes, swapping to disk Miguel Figueiredo
@ 2007-04-30  7:30 ` Andrew Morton
  2007-05-01  5:42   ` Nick Piggin
  0 siblings, 1 reply; 5+ messages in thread
From: Andrew Morton @ 2007-04-30  7:30 UTC (permalink / raw)
  To: Miguel Figueiredo; +Cc: linux-kernel

On Sun, 29 Apr 2007 11:28:05 +0100 Miguel Figueiredo <elmig@debianpt.org> wrote:

> Hi all,
> 
> today, with 2.6.21, my laptop had a really odd behaviour. It started 
> writing to disk for a few minutes with no interactivity at all (no 
> redraw on screen, only hdd led on). It's the first time i noticed 
> OOM-killer started do kill programs.
> 
> It was totally unresponsive for minutes, after back to life it had a 
> load of ~19.0, and 300+ MB on swap (first time i saw this).
> 
> It's an HP pavillon core duo 2.0 GHz, 1 GB RAM
> 
> kern.log details: 
> http://www.debianpt.org/~elmig/pool/kernel/20070429/kern.log
> .config: http://www.debianpt.org/~elmig/pool/kernel/20070429/2.6.21.config
> dmesg: http://www.debianpt.org/~elmig/pool/kernel/20070429/dmesg
> 
> As this is the first time it happened and it felt odd i am reporting.
> 
> If aditional info is needed please CC me as i am not on the list.
> 

hm, a genuine oom on an all-ext3 data=ordered i386 system, just like a
million other people.  How very weird.

I assume all those pages on the LRU are pagecache pages which for some
reason we're unable to reclaim.

If some privileged application went berzerk mlock()ing everything then that
might explain it.  It sounds improbable, but then, something improbable has
happened.

We cleverly managed to not display the pagecache totals in the oom-killer
output.  Could you please take a copy of /proc/meminfo after an
oom-killing, send that?  And /proc/vmstat, I guess.

If you're keen, we could eliminate the mlock possibility by adding this:

--- a/mm/mlock.c~a
+++ a/mm/mlock.c
@@ -127,6 +127,8 @@ asmlinkage long sys_mlock(unsigned long 
 	unsigned long lock_limit;
 	int error = -ENOMEM;
 
+	return 0;
+
 	if (!can_do_mlock())
 		return -EPERM;
 
@@ -151,6 +153,8 @@ asmlinkage long sys_munlock(unsigned lon
 {
 	int ret;
 
+	return 0;
+
 	down_write(&current->mm->mmap_sem);
 	len = PAGE_ALIGN(len + (start & ~PAGE_MASK));
 	start &= PAGE_MASK;
_


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: 2.6.21 frozen for a few minutes, swapping to disk
  2007-04-30  7:30 ` Andrew Morton
@ 2007-05-01  5:42   ` Nick Piggin
  2007-05-01  5:49     ` Andrew Morton
  0 siblings, 1 reply; 5+ messages in thread
From: Nick Piggin @ 2007-05-01  5:42 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Miguel Figueiredo, linux-kernel

Andrew Morton wrote:
> On Sun, 29 Apr 2007 11:28:05 +0100 Miguel Figueiredo <elmig@debianpt.org> wrote:
> 
> 
>>Hi all,
>>
>>today, with 2.6.21, my laptop had a really odd behaviour. It started 
>>writing to disk for a few minutes with no interactivity at all (no 
>>redraw on screen, only hdd led on). It's the first time i noticed 
>>OOM-killer started do kill programs.
>>
>>It was totally unresponsive for minutes, after back to life it had a 
>>load of ~19.0, and 300+ MB on swap (first time i saw this).
>>
>>It's an HP pavillon core duo 2.0 GHz, 1 GB RAM
>>
>>kern.log details: 
>>http://www.debianpt.org/~elmig/pool/kernel/20070429/kern.log
>>.config: http://www.debianpt.org/~elmig/pool/kernel/20070429/2.6.21.config
>>dmesg: http://www.debianpt.org/~elmig/pool/kernel/20070429/dmesg
>>
>>As this is the first time it happened and it felt odd i am reporting.
>>
>>If aditional info is needed please CC me as i am not on the list.
>>
> 
> 
> hm, a genuine oom on an all-ext3 data=ordered i386 system, just like a
> million other people.  How very weird.
> 
> I assume all those pages on the LRU are pagecache pages which for some
> reason we're unable to reclaim.

It looks like it used up all swap? I'd guess a memory leak in some
application, or maybe a page refcount leak somewhere.


> 
> If some privileged application went berzerk mlock()ing everything then that
> might explain it.  It sounds improbable, but then, something improbable has
> happened.
> 
> We cleverly managed to not display the pagecache totals in the oom-killer
> output.  Could you please take a copy of /proc/meminfo after an
> oom-killing, send that?  And /proc/vmstat, I guess.
> 
> If you're keen, we could eliminate the mlock possibility by adding this:
> 
> --- a/mm/mlock.c~a
> +++ a/mm/mlock.c
> @@ -127,6 +127,8 @@ asmlinkage long sys_mlock(unsigned long 
>  	unsigned long lock_limit;
>  	int error = -ENOMEM;
>  
> +	return 0;
> +
>  	if (!can_do_mlock())
>  		return -EPERM;
>  
> @@ -151,6 +153,8 @@ asmlinkage long sys_munlock(unsigned lon
>  {
>  	int ret;
>  
> +	return 0;
> +
>  	down_write(&current->mm->mmap_sem);
>  	len = PAGE_ALIGN(len + (start & ~PAGE_MASK));
>  	start &= PAGE_MASK;
> _
> 

-- 
SUSE Labs, Novell Inc.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: 2.6.21 frozen for a few minutes, swapping to disk
  2007-05-01  5:42   ` Nick Piggin
@ 2007-05-01  5:49     ` Andrew Morton
  2007-05-01  5:58       ` Nick Piggin
  0 siblings, 1 reply; 5+ messages in thread
From: Andrew Morton @ 2007-05-01  5:49 UTC (permalink / raw)
  To: Nick Piggin; +Cc: Miguel Figueiredo, linux-kernel

On Tue, 01 May 2007 15:42:30 +1000 Nick Piggin <nickpiggin@yahoo.com.au> wrote:

> > hm, a genuine oom on an all-ext3 data=ordered i386 system, just like a
> > million other people.  How very weird.
> > 
> > I assume all those pages on the LRU are pagecache pages which for some
> > reason we're unable to reclaim.
> 
> It looks like it used up all swap? I'd guess a memory leak in some
> application, or maybe a page refcount leak somewhere.

yes, I missed that.   The number of mapped pages is tiny so the thing has
been trying to swap out like.

The question is: how much memory is free after the oom-killing storm?
If it's "lots" then it's probably an application problem.  If it's
"not much" then perhaps there's a kernel leak.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: 2.6.21 frozen for a few minutes, swapping to disk
  2007-05-01  5:49     ` Andrew Morton
@ 2007-05-01  5:58       ` Nick Piggin
  0 siblings, 0 replies; 5+ messages in thread
From: Nick Piggin @ 2007-05-01  5:58 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Miguel Figueiredo, linux-kernel

Andrew Morton wrote:
> On Tue, 01 May 2007 15:42:30 +1000 Nick Piggin <nickpiggin@yahoo.com.au> wrote:
> 
> 
>>>hm, a genuine oom on an all-ext3 data=ordered i386 system, just like a
>>>million other people.  How very weird.
>>>
>>>I assume all those pages on the LRU are pagecache pages which for some
>>>reason we're unable to reclaim.
>>
>>It looks like it used up all swap? I'd guess a memory leak in some
>>application, or maybe a page refcount leak somewhere.
> 
> 
> yes, I missed that.   The number of mapped pages is tiny so the thing has
> been trying to swap out like.

I didn't quite parse this :)

If the memory is leaking slowly, it could be eventually pushing
everything out to swap without having a large amount of mapped pages.

Or if something is slowly writing stuff to tmpfs, that may not show
up in mapped pages either.


> The question is: how much memory is free after the oom-killing storm?
> If it's "lots" then it's probably an application problem.  If it's
> "not much" then perhaps there's a kernel leak.

Yeah, or a tmpfs filesystem being filled up (what does `df` say?).

-- 
SUSE Labs, Novell Inc.

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2007-05-01  5:58 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-04-29 10:28 2.6.21 frozen for a few minutes, swapping to disk Miguel Figueiredo
2007-04-30  7:30 ` Andrew Morton
2007-05-01  5:42   ` Nick Piggin
2007-05-01  5:49     ` Andrew Morton
2007-05-01  5:58       ` Nick Piggin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox