Google's mm problem - not reproduced on 2.4.13

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* Google's mm problem - not reproduced on 2.4.13
@ 2001-10-31 18:06 Daniel Phillips
  2001-10-31 20:39 ` Daniel Phillips
  0 siblings, 1 reply; 37+ messages in thread
From: Daniel Phillips @ 2001-10-31 18:06 UTC (permalink / raw)
  To: linux-kernel; +Cc: Rik van Riel, Andrea Arcangeli

Good morning Ben,

I just tried your test program with 2.4.13, 2 Gig, and it ran without 
problems.  Could you try that over there and see if you get the same result?
If it does run, the next move would be to check with 3.5 Gig.

Regards,

Daniel

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Google's mm problem - not reproduced on 2.4.13
  2001-10-31 18:06 Google's mm problem - not reproduced on 2.4.13 Daniel Phillips
@ 2001-10-31 20:39 ` Daniel Phillips
  2001-10-31 20:45   ` Andrea Arcangeli
  2001-10-31 20:48   ` Rik van Riel
  0 siblings, 2 replies; 37+ messages in thread
From: Daniel Phillips @ 2001-10-31 20:39 UTC (permalink / raw)
  To: linux-kernel; +Cc: Rik van Riel, Andrea Arcangeli

On October 31, 2001 07:06 pm, Daniel Phillips wrote:
> I just tried your test program with 2.4.13, 2 Gig, and it ran without 
> problems.  Could you try that over there and see if you get the same result?
> If it does run, the next move would be to check with 3.5 Gig.

Ben reports that his test with 2 Gig memory runs fine, as it does for me, but 
that it locks up tight with 3.5 Gig, requiring power cycle.  Since I only 
have 2 Gig here I can't reproduce that (yet).

--
Daniel

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Google's mm problem - not reproduced on 2.4.13
  2001-10-31 20:39 ` Daniel Phillips
@ 2001-10-31 20:45   ` Andrea Arcangeli
  2001-10-31 21:03     ` Daniel Phillips
  2001-11-01  0:19     ` Daniel Phillips
  2001-10-31 20:48   ` Rik van Riel
  1 sibling, 2 replies; 37+ messages in thread
From: Andrea Arcangeli @ 2001-10-31 20:45 UTC (permalink / raw)
  To: Daniel Phillips; +Cc: linux-kernel, Rik van Riel

On Wed, Oct 31, 2001 at 09:39:12PM +0100, Daniel Phillips wrote:
> On October 31, 2001 07:06 pm, Daniel Phillips wrote:
> > I just tried your test program with 2.4.13, 2 Gig, and it ran without 
> > problems.  Could you try that over there and see if you get the same result?
> > If it does run, the next move would be to check with 3.5 Gig.
> 
> Ben reports that his test with 2 Gig memory runs fine, as it does for me, but 
> that it locks up tight with 3.5 Gig, requiring power cycle.  Since I only 
> have 2 Gig here I can't reproduce that (yet).

are you sure it isn't an oom condition. can you reproduce on
2.4.14pre5aa1? mainline (at least before pre6) could deadlock with too
much mlocked memory.

Andrea

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Google's mm problem - not reproduced on 2.4.13
  2001-10-31 20:45   ` Andrea Arcangeli
@ 2001-10-31 21:03     ` Daniel Phillips
  2001-10-31 21:53       ` Andreas Dilger
  2001-10-31 22:12       ` Google's mm problem - not reproduced on 2.4.13 Ben Smith
  2001-11-01  0:19     ` Daniel Phillips
  1 sibling, 2 replies; 37+ messages in thread
From: Daniel Phillips @ 2001-10-31 21:03 UTC (permalink / raw)
  To: Andrea Arcangeli; +Cc: linux-kernel, Rik van Riel, Ben Smith

On October 31, 2001 09:45 pm, Andrea Arcangeli wrote:
> On Wed, Oct 31, 2001 at 09:39:12PM +0100, Daniel Phillips wrote:
> > On October 31, 2001 07:06 pm, Daniel Phillips wrote:
> > > I just tried your test program with 2.4.13, 2 Gig, and it ran without 
> > > problems.  Could you try that over there and see if you get the same result?
> > > If it does run, the next move would be to check with 3.5 Gig.
> > 
> > Ben reports that his test with 2 Gig memory runs fine, as it does for me, but 
> > that it locks up tight with 3.5 Gig, requiring power cycle.  Since I only 
> > have 2 Gig here I can't reproduce that (yet).
> 
> are you sure it isn't an oom condition. can you reproduce on
> 2.4.14pre5aa1? mainline (at least before pre6) could deadlock with too
> much mlocked memory.

I don't know, I can't reproduce it here, I don't have enough memory.  Ben?

--
Daniel

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Google's mm problem - not reproduced on 2.4.13
  2001-10-31 21:03     ` Daniel Phillips
@ 2001-10-31 21:53       ` Andreas Dilger
  2001-11-01  4:52         ` Daniel Phillips
  2001-10-31 22:12       ` Google's mm problem - not reproduced on 2.4.13 Ben Smith
  1 sibling, 1 reply; 37+ messages in thread
From: Andreas Dilger @ 2001-10-31 21:53 UTC (permalink / raw)
  To: Daniel Phillips; +Cc: linux-kernel

On Oct 31, 2001  22:03 +0100, Daniel Phillips wrote:
> Ben reports that his test with 2 Gig memory runs fine, as it does for me, but 
> that it locks up tight with 3.5 Gig, requiring power cycle.  Since I only 
> have 2 Gig here I can't reproduce that (yet).

Sadly, I bought some memory yesterday, and it was only U$30 for 256MB
DIMMs, so $120/GB if you have enough slots.  Not that I'm suggesting
you go out and but more memory Daniel, as you probably have your slots
filled with 2GB, and larger sticks are still bit more expesive.

The only thing that bugs me about the low memory price is that Windows
XP recommends at least 128MB for a workable system.  A year or two ago
that would have been considered a bloated pig, but now they are giving
away 128MB DIMMs with a purchase of XP.  Sad, really.  Maybe M$ is
subsidizing the chipmakers to make RAM cheap so XP can run on peoples'
computers ;-)?  What else would you do with U$50 billion in cash (or
whatever) that M$ has?

Cheers, Andreas
--
Andreas Dilger
http://sourceforge.net/projects/ext2resize/
http://www-mddsp.enel.ucalgary.ca/People/adilger/

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Google's mm problem - not reproduced on 2.4.13
  2001-10-31 21:53       ` Andreas Dilger
@ 2001-11-01  4:52         ` Daniel Phillips
  2001-11-01 16:56           ` undefined reference in 2.2.19 build with Reiserfs (was: Google's mm problem - not reproduced on 2.4.13) Sven Heinicke
  0 siblings, 1 reply; 37+ messages in thread
From: Daniel Phillips @ 2001-11-01  4:52 UTC (permalink / raw)
  To: Andreas Dilger; +Cc: linux-kernel

On October 31, 2001 10:53 pm, Andreas Dilger wrote:
> On Oct 31, 2001  22:03 +0100, Daniel Phillips wrote:
> > Ben reports that his test with 2 Gig memory runs fine, as it does for me, but 
> > that it locks up tight with 3.5 Gig, requiring power cycle.  Since I only 
> > have 2 Gig here I can't reproduce that (yet).
> 
> Sadly, I bought some memory yesterday, and it was only U$30 for 256MB
> DIMMs, so $120/GB if you have enough slots.  Not that I'm suggesting
> you go out and but more memory Daniel, as you probably have your slots
> filled with 2GB, and larger sticks are still bit more expesive.

You're not kidding.  Just FYI, four 1GB sticks for this machine will set you
back a  kilobuck.  (PC/133 Registered SDRAM 72-bit ECC, 168-pin gold-plated
DIMM)

--
Daniel

^ permalink raw reply	[flat|nested] 37+ messages in thread

* undefined reference in 2.2.19 build with Reiserfs  (was: Google's mm problem - not reproduced on 2.4.13)
  2001-11-01  4:52         ` Daniel Phillips
@ 2001-11-01 16:56           ` Sven Heinicke
  2001-11-01 22:39             ` Keith Owens
  0 siblings, 1 reply; 37+ messages in thread
From: Sven Heinicke @ 2001-11-01 16:56 UTC (permalink / raw)
  To: linux-kernel


Hi,

We have a couple of Dells with 4G of memory.  We have been
experiencing the same google problems.  My boss has asked me to roll
one of them back to a 2.2 kernel, we will live with 2G of memory for
now, while I help out with testing mmap problems.  I am, however
having problems compiling the 2.2.19 kernel with the
linux-2.2.19-reiserfs-3.5.34-patch.bz2 patch.  I get the following
error:

ld -m elf_i386 -T /home/sven/linux-build/linux-2.2.19-reiserfs/arch/i386/vmlinux.lds -e stext arch/i386/kernel/head.o arch/i386/kernel/init_task.o init/main.o init/version.o \
	--start-group \
	arch/i386/kernel/kernel.o arch/i386/mm/mm.o kernel/kernel.o mm/mm.o fs/fs.o ipc/ipc.o \
	fs/filesystems.a \
	net/network.a \
	drivers/block/block.a drivers/char/char.o drivers/misc/misc.a drivers/net/net.a drivers/scsi/scsi.a drivers/pci/pci.a drivers/video/video.a \
	/home/sven/linux-build/linux-2.2.19-reiserfs/arch/i386/lib/lib.a /home/sven/linux-build/linux-2.2.19-reiserfs/lib/lib.a /home/sven/linux-build/linux-2.2.19-reiserfs/arch/i386/lib/lib.a \
	--end-group \
	-o vmlinux
fs/filesystems.a(reiserfs.o): In function `ip_check_balance':
reiserfs.o(.text+0x9cc2): undefined reference to `memset'
drivers/scsi/scsi.a(aic7xxx.o): In function `aic7xxx_load_seeprom':
aic7xxx.o(.text+0x117ff): undefined reference to `memcpy'
make: *** [vmlinux] Error 1

I tried patching the 2.2.20-pre12 patch but got the same (or similar)
results.

After I get this 2.2.19 system stable by boss says fixing the google
bug is my "top priority".  Unfortunely this will only like be the
second time I dig into linux source code so I expect it will be mostly
me testing other people patches.  But I will try my best.

Here is info from my 2.2.19 system as asked for in the REPORTING-BUGS
file:

This is a Red Hat 7.1 system.

$ source scripts/ver_linux 
Linux ps1.web.nj.nec.com 2.4.9-marcelo-patch #10 SMP Wed Aug 22 15:13:48 EDT 2001 i686 unknown
 
Gnu C                  2.96
Gnu make               3.79.1
binutils               2.10.91.0.2
util-linux             2.11b
modutils               2.4.2
e2fsprogs              1.19
reiserfsprogs          3.x.0f
Linux C Library        2.2.2
Dynamic linker (ldd)   2.2.2
Procps                 2.0.7
Net-tools              1.57
Console-tools          0.3.3
Sh-utils               2.0
Modules Loaded         autofs eepro100 md

the Marcelo Patch is from the list time I stuck my nose in the kernel
with a himem patch.  Checking my diff and the 2.4.13 kernel the stuff
is nearly the same.


$ cat /proc/cpuinfo 
processor	: 0
vendor_id	: GenuineIntel
cpu family	: 6
model		: 8
model name	: Pentium III (Coppermine)
stepping	: 6
cpu MHz		: 993.400
cache size	: 256 KB
fdiv_bug	: no
hlt_bug		: no
f00f_bug	: no
coma_bug	: no
fpu		: yes
fpu_exception	: yes
cpuid level	: 2
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 mmx fxsr sse
bogomips	: 1979.18

processor	: 1
vendor_id	: GenuineIntel
cpu family	: 6
model		: 8
model name	: Pentium III (Coppermine)
stepping	: 6
cpu MHz		: 993.400
cache size	: 256 KB
fdiv_bug	: no
hlt_bug		: no
f00f_bug	: no
coma_bug	: no
fpu		: yes
fpu_exception	: yes
cpuid level	: 2
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 mmx fxsr sse
bogomips	: 1985.74

$ cat /proc/scsi/scsi 
Attached devices: 
Host: scsi0 Channel: 00 Id: 00 Lun: 00
  Vendor: SEAGATE  Model: ST173404LC       Rev: 0004
  Type:   Direct-Access                    ANSI SCSI revision: 03
Host: scsi0 Channel: 00 Id: 01 Lun: 00
  Vendor: SEAGATE  Model: ST173404LC       Rev: 0004
  Type:   Direct-Access                    ANSI SCSI revision: 03
Host: scsi0 Channel: 00 Id: 02 Lun: 00
  Vendor: SEAGATE  Model: ST173404LC       Rev: 0004
  Type:   Direct-Access                    ANSI SCSI revision: 03
Host: scsi0 Channel: 00 Id: 03 Lun: 00
  Vendor: SEAGATE  Model: ST173404LC       Rev: 0004
  Type:   Direct-Access                    ANSI SCSI revision: 03
Host: scsi0 Channel: 00 Id: 04 Lun: 00
  Vendor: SEAGATE  Model: ST173404LC       Rev: 0004
  Type:   Direct-Access                    ANSI SCSI revision: 03
Host: scsi0 Channel: 00 Id: 05 Lun: 00
  Vendor: SEAGATE  Model: ST173404LC       Rev: 0004
  Type:   Direct-Access                    ANSI SCSI revision: 03
Host: scsi0 Channel: 00 Id: 06 Lun: 00
  Vendor: DELL     Model: 1x6 U2W SCSI BP  Rev: 5.35
  Type:   Processor                        ANSI SCSI revision: 02
Host: scsi2 Channel: 00 Id: 05 Lun: 00
  Vendor: NEC      Model: CD-ROM DRIVE:466 Rev: 1.06
  Type:   CD-ROM                           ANSI SCSI revision: 02

bash-2.04$ lsmod
Module                  Size  Used by
autofs                 11920   1 (autoclean)
eepro100               17184   1 (autoclean)
md                     43616   0 (unused)

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: undefined reference in 2.2.19 build with Reiserfs (was: Google's mm problem - not reproduced on 2.4.13)
  2001-11-01 16:56           ` undefined reference in 2.2.19 build with Reiserfs (was: Google's mm problem - not reproduced on 2.4.13) Sven Heinicke
@ 2001-11-01 22:39             ` Keith Owens
  0 siblings, 0 replies; 37+ messages in thread
From: Keith Owens @ 2001-11-01 22:39 UTC (permalink / raw)
  To: Sven Heinicke; +Cc: linux-kernel

On Thu, 1 Nov 2001 11:56:04 -0500 (EST), 
Sven Heinicke <sven@research.nj.nec.com> wrote:
>fs/filesystems.a(reiserfs.o): In function `ip_check_balance':
>reiserfs.o(.text+0x9cc2): undefined reference to `memset'
>drivers/scsi/scsi.a(aic7xxx.o): In function `aic7xxx_load_seeprom':
>aic7xxx.o(.text+0x117ff): undefined reference to `memcpy'

The aic7xxx reference to memcpy is a gcc feature.  If you do an
assignment of a complete structure then gcc may convert that into a
call to memcpy().  Alas gcc does the conversion using the "standard"
version of memcpy, not the "optimized by cpp" version that the kernel
uses.  Try this patch

Index: 19.1/drivers/scsi/aic7xxx.c
--- 19.1/drivers/scsi/aic7xxx.c Tue, 13 Feb 2001 08:26:08 +1100 kaos (linux-2.2/d/b/43_aic7xxx.c 1.1.1.3.2.1.3.1.1.3 644)
+++ 19.1(w)/drivers/scsi/aic7xxx.c Fri, 02 Nov 2001 09:36:49 +1100 kaos (linux-2.2/d/b/43_aic7xxx.c 1.1.1.3.2.1.3.1.1.3 644)
@@ -9190,7 +9190,7 @@ aic7xxx_load_seeprom(struct aic7xxx_host
         p->flags |= AHC_TERM_ENB_SE_LOW | AHC_TERM_ENB_SE_HIGH;
       }
     }
-    p->sc = *sc;
+    memcpy(&(p->sc), sc, sizeof(p->sc));
   }
 
   p->discenable = 0;

Cannot help with the reiserfs problem, the code is not in the pristine
2.2.19 tree.


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Google's mm problem - not reproduced on 2.4.13
  2001-10-31 21:03     ` Daniel Phillips
  2001-10-31 21:53       ` Andreas Dilger
@ 2001-10-31 22:12       ` Ben Smith
  2001-11-01  0:34         ` Andrea Arcangeli
  2001-11-02 17:51         ` Sven Heinicke
  1 sibling, 2 replies; 37+ messages in thread
From: Ben Smith @ 2001-10-31 22:12 UTC (permalink / raw)
  To: Daniel Phillips; +Cc: Andrea Arcangeli, linux-kernel, Rik van Riel

 > On October 31, 2001 09:45 pm, Andrea Arcangeli wrote:
 >
 >>On Wed, Oct 31, 2001 at 09:39:12PM +0100, Daniel Phillips wrote:
 >>
 >>>On October 31, 2001 07:06 pm, Daniel Phillips wrote:
 >>>
 >>>>I just tried your test program with 2.4.13, 2 Gig, and it ran
 >>>>without problems.  Could you try that over there and see if you
 >>>>get the same result?  If it does run, the next move would be to
 >>>>check with 3.5 Gig.
 >>>>
 >>>Ben reports that his test with 2 Gig memory runs fine, as it does
 >>>for me, but that it locks up tight with 3.5 Gig, requiring power
 >>>cycle.  Since I only have 2 Gig here I can't reproduce that (yet).
 >>>
 >>are you sure it isn't an oom condition. can you reproduce on
 >>2.4.14pre5aa1? mainline (at least before pre6) could deadlock with
 >>too much mlocked memory.
 >>
 >
 > I don't know, I can't reproduce it here, I don't have enough memory.
 > Ben?

My test application gets killed (I believe by the oom handler). dmesg
complains about a lot of 0-order allocation failures. For this test,
I'm running with 2.4.14pre5aa1, 3.5gb of RAM, 2 PIII 1Ghz.
  - Ben

Ben Smith
Google, Inc


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Google's mm problem - not reproduced on 2.4.13
  2001-10-31 22:12       ` Google's mm problem - not reproduced on 2.4.13 Ben Smith
@ 2001-11-01  0:34         ` Andrea Arcangeli
  2001-11-02 17:51         ` Sven Heinicke
  1 sibling, 0 replies; 37+ messages in thread
From: Andrea Arcangeli @ 2001-11-01  0:34 UTC (permalink / raw)
  To: Ben Smith; +Cc: Daniel Phillips, linux-kernel, Rik van Riel

On Wed, Oct 31, 2001 at 02:12:00PM -0800, Ben Smith wrote:
> My test application gets killed (I believe by the oom handler). dmesg
> complains about a lot of 0-order allocation failures. For this test,
> I'm running with 2.4.14pre5aa1, 3.5gb of RAM, 2 PIII 1Ghz.

Interesting, now we need to find out if the problem is the allocator in
2.4.14pre5aa1 that fails too early by mistake, or if this is a true oom
condition. I tend to think it's a true oom condition since mainline
deadlocked under the same workload where -aa correctly killed the task.

Can you provide also a 'vmstat 1' trace of the last 20/30 seconds before
the task gets killed?

A true oom condition could be caused by a memleak in mlock or something
like that (or of course it could be a bug in the userspace testcase, but
I checked the testcase a few weeks ago and I didn't found anything wrong
in it).

Andrea

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Google's mm problem - not reproduced on 2.4.13
  2001-10-31 22:12       ` Google's mm problem - not reproduced on 2.4.13 Ben Smith
  2001-11-01  0:34         ` Andrea Arcangeli
@ 2001-11-02 17:51         ` Sven Heinicke
  2001-11-02 18:00           ` Andrea Arcangeli
  2001-11-02 18:11           ` Daniel Phillips
  1 sibling, 2 replies; 37+ messages in thread
From: Sven Heinicke @ 2001-11-02 17:51 UTC (permalink / raw)
  To: linux-kernel; +Cc: Daniel Phillips, Ben Smith, Andrea Arcangeli, Rik van Riel

Ben Smith writes:
 >  > On October 31, 2001 09:45 pm, Andrea Arcangeli wrote:
 >  >
 >  >>On Wed, Oct 31, 2001 at 09:39:12PM +0100, Daniel Phillips wrote:
 >  >>
 >  >>>On October 31, 2001 07:06 pm, Daniel Phillips wrote:
 >  >>>
 >  >>>>I just tried your test program with 2.4.13, 2 Gig, and it ran
 >  >>>>without problems.  Could you try that over there and see if you
 >  >>>>get the same result?  If it does run, the next move would be to
 >  >>>>check with 3.5 Gig.
 >  >>>>
 >  >>>Ben reports that his test with 2 Gig memory runs fine, as it does
 >  >>>for me, but that it locks up tight with 3.5 Gig, requiring power
 >  >>>cycle.  Since I only have 2 Gig here I can't reproduce that (yet).
 >  >>>
 >  >>are you sure it isn't an oom condition. can you reproduce on
 >  >>2.4.14pre5aa1? mainline (at least before pre6) could deadlock with
 >  >>too much mlocked memory.
 >  >>
 >  >
 >  > I don't know, I can't reproduce it here, I don't have enough memory.
 >  > Ben?
 > 
 > My test application gets killed (I believe by the oom handler). dmesg
 > complains about a lot of 0-order allocation failures. For this test,
 > I'm running with 2.4.14pre5aa1, 3.5gb of RAM, 2 PIII 1Ghz.
 >   - Ben
 > 
 > Ben Smith
 > Google, Inc
 > 


This is a System with 4G of memory and regular swap.  With 2 Pentium
III 1Ghz processors.

On 2.4.14-pre6aa1 it happily runs until:

munmap'ed 7317d000
Loading data at 7317d000 for slot 2
Load (/mnt/sdb/sven/chunk10) succeeded!
mlocking slot 2, 7317d000
mlocking at 7317d000 of size 1048576
Connection to hera closed by remote host.
Connection to hera closed.

Where is kills my ssh and other programs.  fills my /var/log/messages
with:

Nov  2 11:29:07 ps2 kernel: __alloc_pages: 0-order allocation failed (gfp=0x1d2/0)
Nov  2 11:29:07 ps2 syslogd: select: Cannot allocate memory
Nov  2 11:29:07 ps2 kernel: __alloc_pages: 0-order allocation failed (gfp=0x1d2/0)
Nov  2 11:29:07 ps2 kernel: __alloc_pages: 0-order allocation failed (gfp=0x1f0/0)
Nov  2 11:29:07 ps2 last message repeated 2 times

a bunch of times.  Then doesn't free the mmaped memory until file
system is unmounted.  It never starts going into swap.

2.4.14-pre5aa1 does about the same.

	       Sven

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Google's mm problem - not reproduced on 2.4.13
  2001-11-02 17:51         ` Sven Heinicke
@ 2001-11-02 18:00           ` Andrea Arcangeli
  2001-11-02 18:19             ` Daniel Phillips
  2001-11-02 18:11           ` Daniel Phillips
  1 sibling, 1 reply; 37+ messages in thread
From: Andrea Arcangeli @ 2001-11-02 18:00 UTC (permalink / raw)
  To: Sven Heinicke; +Cc: linux-kernel, Daniel Phillips, Ben Smith, Rik van Riel

On Fri, Nov 02, 2001 at 12:51:09PM -0500, Sven Heinicke wrote:
> a bunch of times.  Then doesn't free the mmaped memory until file
> system is unmounted.  It never starts going into swap.

thanks for testing. This matches the idea that those pages doesn't want
to be unmapped for whatever reason (and because there's an mlock in our
way at the moment I'd tend to point my finger in that direction rather
than into the vm direction). I'll look more closely into this testcase
shortly.

Andrea

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Google's mm problem - not reproduced on 2.4.13
  2001-11-02 18:00           ` Andrea Arcangeli
@ 2001-11-02 18:19             ` Daniel Phillips
  2001-11-02 20:27               ` Linus Torvalds
       [not found]               ` <200111022027.fA2KRwe20006@penguin.transmeta.com>
  0 siblings, 2 replies; 37+ messages in thread
From: Daniel Phillips @ 2001-11-02 18:19 UTC (permalink / raw)
  To: Andrea Arcangeli, Sven Heinicke; +Cc: linux-kernel, Ben Smith, Rik van Riel

On November 2, 2001 07:00 pm, Andrea Arcangeli wrote:
> On Fri, Nov 02, 2001 at 12:51:09PM -0500, Sven Heinicke wrote:
> > a bunch of times.  Then doesn't free the mmaped memory until file
> > system is unmounted.  It never starts going into swap.
> 
> thanks for testing. This matches the idea that those pages doesn't want
> to be unmapped for whatever reason (and because there's an mlock in our
> way at the moment I'd tend to point my finger in that direction rather
> than into the vm direction). I'll look more closely into this testcase
> shortly.

The mlock handling looks dead simple:

vmscan.c
227         if (vma->vm_flags & (VM_LOCKED|VM_RESERVED))
228                 return count;

It's hard to see how that could be wrong.  Plus, this test program does run 
under 2.4.9, it just uses way too much CPU on that kernel.  So I'd say mm 
bug.

--
Daniel

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Google's mm problem - not reproduced on 2.4.13
  2001-11-02 18:19             ` Daniel Phillips
@ 2001-11-02 20:27               ` Linus Torvalds
  2001-11-02 21:08                 ` Ben Smith
  2001-11-02 21:12                 ` Google's mm problem - not reproduced on 2.4.13 Rik van Riel
       [not found]               ` <200111022027.fA2KRwe20006@penguin.transmeta.com>
  1 sibling, 2 replies; 37+ messages in thread
From: Linus Torvalds @ 2001-11-02 20:27 UTC (permalink / raw)
  To: linux-kernel

In article <20011102181758Z16039-4784+420@humbolt.nl.linux.org>,
Daniel Phillips  <phillips@bonn-fries.net> wrote:
>
>It's hard to see how that could be wrong.  Plus, this test program does run 
>under 2.4.9, it just uses way too much CPU on that kernel.  So I'd say mm 
>bug.

So how much memory is mlocked?

The locked memory will stay in the inactive list (it won't even ever be
activated, because we don't bother even scanning the mapped locked
regions), and the inactive list fills up with pages that are completely
worthless. 

And the kernel will decide that because most of the unfreeable pages are
mapped, it needs to do VM scanning, which obviously doesn't help.

Why _does_ this thing do mlock, anyway? What's the point? And how much
does it try to lock?

If root wants to shoot himself in the head by mlocking all of memory,
that's not a VM problem, that's a stupid administrator problem.

		Linus

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Google's mm problem - not reproduced on 2.4.13
  2001-11-02 20:27               ` Linus Torvalds
@ 2001-11-02 21:08                 ` Ben Smith
  2001-11-02 21:20                   ` Linus Torvalds
  2001-11-02 21:12                 ` Google's mm problem - not reproduced on 2.4.13 Rik van Riel
  1 sibling, 1 reply; 37+ messages in thread
From: Ben Smith @ 2001-11-02 21:08 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-kernel

> So how much memory is mlocked?


In the 3.5G case, we lock 4 blocks (4 * 427683520 bytes, or 1.631M). 
There is code in the kernel that prevents more than 1/2 of all physical 
pages from being mlocked:

mlock.c:215-218: (in do_mlock)

	/* we may lock at most half of physical memory... */
	/* (this check is pretty bogus, but doesn't hurt) */
	if (locked > num_physpages/2)
		goto out;


For 2.2 we were have a patch that increases this to 90% or 60M, but we 
don't use this patch on 2.4 yet.


> Why _does_ this thing do mlock, anyway? What's the point? And how much
> does it try to lock?


Latency. We know exactly what data should remain in memory, so we're 
trying to prevent the vm from paging out the wrong data. It makes a huge 
difference in performance.
  - Ben

Ben Smith
Google, Inc.


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Google's mm problem - not reproduced on 2.4.13
  2001-11-02 21:08                 ` Ben Smith
@ 2001-11-02 21:20                   ` Linus Torvalds
  2001-11-02 22:42                     ` Ben Smith
  0 siblings, 1 reply; 37+ messages in thread
From: Linus Torvalds @ 2001-11-02 21:20 UTC (permalink / raw)
  To: linux-kernel

In article <3BE30B3D.1080505@google.com>, Ben Smith  <ben@google.com> wrote:
>
>For 2.2 we were have a patch that increases this to 90% or 60M, but we 
>don't use this patch on 2.4 yet.

Well, you'll also deadlock your machine if you happen to lock down the
lowmemory area on x86. Sounds like a _bad_ idea.

Anyway, I posted a suggested patch that should fix the behaviour, but it
doesn't fix the fundamental problem with locking the wrong kinds of
pages (ie you're definitely on your own if you happen to lock down most
of the low 1GB of an intel machine).

>Latency. We know exactly what data should remain in memory, so we're 
>trying to prevent the vm from paging out the wrong data. It makes a huge 
>difference in performance.

It would be interesting to hear whether that is equally true in the new
VM that doesn't necessarily page stuff out unless it can show that the
memory pressure is actually from VM mappings.

How big is your mlock area during real load? Still the "max the kernel
will allow"? Or is that just a benchmark/test kind of thing?

		Linus

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Google's mm problem - not reproduced on 2.4.13
  2001-11-02 21:20                   ` Linus Torvalds
@ 2001-11-02 22:42                     ` Ben Smith
  2001-11-02 23:15                       ` Daniel Phillips
  2001-11-03 22:53                       ` Adaptec vs Symbios performance Stephan von Krawczynski
  0 siblings, 2 replies; 37+ messages in thread
From: Ben Smith @ 2001-11-02 22:42 UTC (permalink / raw)
  To: linux-kernel

> Anyway, I posted a suggested patch that should fix the behaviour, but it
> doesn't fix the fundamental problem with locking the wrong kinds of
> pages (ie you're definitely on your own if you happen to lock down most

> of the low 1GB of an intel machine).


I've tried the patch you sent and it doesn't help. I applied the patch 
to 2.4.13-pre7 and it hung the machine in the same way (ctrl-alt-del 
didn't work). The last few lines of vmstat before the machine hung look 
like this:
  0  1  0      0 133444   5132 3367312   0   0 31196     0 1121  2123 
0   6  94
  0  1  0      0  63036   5216 3435920   0   0 34338    14 1219  2272 
0   5  95
  2  0  1      0   6156   1828 3494904   0   0 31268     0 1130  2198 
0  23  77
  1  0  1      0   3596    864 3498488   0   0  2720    16 1640  1068 
0  88  12


> It would be interesting to hear whether that is equally true in the new
> VM that doesn't necessarily page stuff out unless it can show that the
> memory pressure is actually from VM mappings.
> 
> How big is your mlock area during real load? Still the "max the kernel
> will allow"? Or is that just a benchmark/test kind of thing?

I haven't had a chance to try my real app yet, but my test application 
is a good simulation of what the real program does, minus any of the 
accessing of the data that it maps. Since it's the only application 
running, and for performance reasons we'd need all of our data in 
memory, we map the "max the kernel will allow".

As another note, I've re-written my test application to use madvise 
instead of mlock, on a suggestion from Andrea. It also doesn't work. For 
2.4.13, after running for a while, my test app hangs, using one CPU, and 
kswapd consumes the other CPU. I was eventually able to kill my test app.

I've also re-written my test app to use anonymous mmap, followed by a 
mlock and read()'s. This actually does work without problems, but 
doesn't really do what we want for other reasons.
  - Ben

Ben Smith
Google, Inc.


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Google's mm problem - not reproduced on 2.4.13
  2001-11-02 22:42                     ` Ben Smith
@ 2001-11-02 23:15                       ` Daniel Phillips
  2001-11-03 22:53                       ` Adaptec vs Symbios performance Stephan von Krawczynski
  1 sibling, 0 replies; 37+ messages in thread
From: Daniel Phillips @ 2001-11-02 23:15 UTC (permalink / raw)
  To: Andrea Arcangeli; +Cc: Ben Smith, linux-kernel

On November 2, 2001 11:42 pm, Ben Smith wrote:
> As another note, I've re-written my test application to use madvise 
> instead of mlock, on a suggestion from Andrea. It also doesn't work. For 
> 2.4.13, after running for a while, my test app hangs, using one CPU, and 
> kswapd consumes the other CPU. I was eventually able to kill my test app.

OK, while there may be room for debate over whether the mlock problem is a 
bug there's no question with madvise.  The program still doesn't work if you 
replace the mlocks with madvises (except for the mlock that's used to 
estimate memory size).

--
Daniel


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Adaptec vs Symbios performance
  2001-11-02 22:42                     ` Ben Smith
  2001-11-02 23:15                       ` Daniel Phillips
@ 2001-11-03 22:53                       ` Stephan von Krawczynski
  2001-11-03 23:01                         ` arjan
  1 sibling, 1 reply; 37+ messages in thread
From: Stephan von Krawczynski @ 2001-11-03 22:53 UTC (permalink / raw)
  To: linux-kernel; +Cc: groudier

Hello Justin, hello Gerard                                            
                                                                      
I am looking currently for reasons for bad behaviour of aic7xxx driver
in an shared interrupt setup and general not-nice behaviour of the    
driver regarding multi-tasking environment.                           
Here is what I found in the code:                                     
                                                                      
/*                                                                    
 * SCSI controller interrupt handler.                                 
 */                                                                   
void                                                                  
ahc_linux_isr(int irq, void *dev_id, struct pt_regs * regs)           
{                                                                     
        struct ahc_softc *ahc;                                        
        struct ahc_cmd *acmd;                                         
        u_long flags;                                                 
                                                                      
        ahc = (struct ahc_softc *) dev_id;                            
        ahc_lock(ahc, &flags);                                        
        ahc_intr(ahc);                                                
        /*                                                            
         * It would be nice to run the device queues from a           
         * bottom half handler, but as there is no way to             
         * dynamically register one, we'll have to postpone           
         * that until we get integrated into the kernel.              
         */                                                           
        ahc_linux_run_device_queues(ahc);                             
        acmd = TAILQ_FIRST(&ahc->platform_data->completeq);           
        TAILQ_INIT(&ahc->platform_data->completeq);                   
        ahc_unlock(ahc, &flags);                                      
        if (acmd != NULL)                                             
                ahc_linux_run_complete_queue(ahc, acmd);              
}                                                                     
                                                                      
This is nice. I cannot read the complete code around it (it is derived
from aic7xxx_linux.c) but if I understand the naming and comments     
correct, some workload is done inside the hardware interrupt (which   
shouldn't), which would very much match my tests showing bad overall  
performance behaviour. Obviously this code is old (read the comment)  
and needs reworking.                                                  
Comments?                                                             
                                                                      
Regards,                                                              
Stephan                                                               
                                                                      

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Adaptec vs Symbios performance
  2001-11-03 22:53                       ` Adaptec vs Symbios performance Stephan von Krawczynski
@ 2001-11-03 23:01                         ` arjan
  0 siblings, 0 replies; 37+ messages in thread
From: arjan @ 2001-11-03 23:01 UTC (permalink / raw)
  To: Stephan von Krawczynski; +Cc: linux-kernel

In article <200111032253.XAA20342@webserver.ithnet.com> you wrote:

> Hello Justin, hello Gerard                                            
>                                                                      
> I am looking currently for reasons for bad behaviour of aic7xxx driver
> in an shared interrupt setup and general not-nice behaviour of the    
> driver regarding multi-tasking environment.                           
> Here is what I found in the code:                                     

>         * It would be nice to run the device queues from a           
>         * bottom half handler, but as there is no way to             
>         * dynamically register one, we'll have to postpone           
>         * that until we get integrated into the kernel.              
>         */                    

sounds like a good tasklet candidate......

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Google's mm problem - not reproduced on 2.4.13
  2001-11-02 20:27               ` Linus Torvalds
  2001-11-02 21:08                 ` Ben Smith
@ 2001-11-02 21:12                 ` Rik van Riel
  1 sibling, 0 replies; 37+ messages in thread
From: Rik van Riel @ 2001-11-02 21:12 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-kernel

On Fri, 2 Nov 2001, Linus Torvalds wrote:

> If root wants to shoot himself in the head by mlocking all of memory,
> that's not a VM problem, that's a stupid administrator problem.

The kernel limits the amount of mlock()d memory to
50% of RAM, so we _should_ be ok.

(yes, this limit is per process, but daniel only
has one process running anyway)

regards,

Rik
-- 
DMCA, SSSCA, W3C?  Who cares?  http://thefreeworld.net/

http://www.surriel.com/		http://distro.conectiva.com/


^ permalink raw reply	[flat|nested] 37+ messages in thread

[parent not found: <200111022027.fA2KRwe20006@penguin.transmeta.com>]

* Re: Google's mm problem - not reproduced on 2.4.13
       [not found]               ` <200111022027.fA2KRwe20006@penguin.transmeta.com>
@ 2001-11-02 20:58                 ` Daniel Phillips
  0 siblings, 0 replies; 37+ messages in thread
From: Daniel Phillips @ 2001-11-02 20:58 UTC (permalink / raw)
  To: Sven Heinicke, linux-kernel; +Cc: Ben Smith, Andrea Arcangeli, Rik van Riel

On November 2, 2001 09:27 pm, Linus Torvalds wrote:
> In article <20011102181758Z16039-4784+420@humbolt.nl.linux.org>,
> Daniel Phillips  <phillips@bonn-fries.net> wrote:
> >
> >It's hard to see how that could be wrong.  Plus, this test program does 
> >run under 2.4.9, it just uses way too much CPU on that kernel.  So I'd say 
> >mm bug.
> 
> So how much memory is mlocked?

I'm not sure exactly, I didn't run the test.  I *think* it's just over 50% of 
physical memory.

> The locked memory will stay in the inactive list (it won't even ever be
> activated, because we don't bother even scanning the mapped locked
> regions), and the inactive list fills up with pages that are completely
> worthless. 

Yes, it does various things on various vms.  On 2.4.9 it stays on the 
inactive list until free memory gets down to rock bottom, then most of it 
moves to the active list and the system reaches a steady state where it can 
operate, though with kswapd grabbing 99% CPU (two processor system), but the 
test does complete.  On the current kernel the it dies.

> And the kernel will decide that because most of the unfreeable pages are
> mapped, it needs to do VM scanning, which obviously doesn't help.
> 
> Why _does_ this thing do mlock, anyway? What's the point? And how much
> does it try to lock?

It's how the google database engine works, and keeps latency down, by mapping 
big database files into memory.  I didn't get more information than that on 
the application.

> If root wants to shoot himself in the head by mlocking all of memory,
> that's not a VM problem, that's a stupid administrator problem.

In the tests I did, it was about 1 gig out of 2.  I'm not sure how much 
memory is mlocked in the 3.5 Gig test the one that's failing, but it's 
certainly not anything like all of memory.  Really, we should be able to 
mlock 90%+ of memory without falling over.

--
Daniel

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Google's mm problem - not reproduced on 2.4.13
  2001-11-02 17:51         ` Sven Heinicke
  2001-11-02 18:00           ` Andrea Arcangeli
@ 2001-11-02 18:11           ` Daniel Phillips
  2001-11-02 18:48             ` Sven Heinicke
  1 sibling, 1 reply; 37+ messages in thread
From: Daniel Phillips @ 2001-11-02 18:11 UTC (permalink / raw)
  To: Sven Heinicke, linux-kernel; +Cc: Ben Smith, Andrea Arcangeli, Rik van Riel

On November 2, 2001 06:51 pm, Sven Heinicke wrote:
> Ben Smith writes:
>  >  > On October 31, 2001 09:45 pm, Andrea Arcangeli wrote:
>  >  >
>  >  >>On Wed, Oct 31, 2001 at 09:39:12PM +0100, Daniel Phillips wrote:
>  >  >>
>  >  >>>On October 31, 2001 07:06 pm, Daniel Phillips wrote:
>  >  >>>
>  >  >>>>I just tried your test program with 2.4.13, 2 Gig, and it ran
>  >  >>>>without problems.  Could you try that over there and see if you
>  >  >>>>get the same result?  If it does run, the next move would be to
>  >  >>>>check with 3.5 Gig.
>  >  >>>>
>  >  >>>Ben reports that his test with 2 Gig memory runs fine, as it does
>  >  >>>for me, but that it locks up tight with 3.5 Gig, requiring power
>  >  >>>cycle.  Since I only have 2 Gig here I can't reproduce that (yet).
>  >  >>>
>  >  >>are you sure it isn't an oom condition. can you reproduce on
>  >  >>2.4.14pre5aa1? mainline (at least before pre6) could deadlock with
>  >  >>too much mlocked memory.
>  >  >>
>  >  >
>  >  > I don't know, I can't reproduce it here, I don't have enough memory.
>  >  > Ben?
>  > 
>  > My test application gets killed (I believe by the oom handler). dmesg
>  > complains about a lot of 0-order allocation failures. For this test,
>  > I'm running with 2.4.14pre5aa1, 3.5gb of RAM, 2 PIII 1Ghz.
>  >   - Ben
>  > 
>  > Ben Smith
>  > Google, Inc
>  > 
> 
> 
> This is a System with 4G of memory and regular swap.  With 2 Pentium
> III 1Ghz processors.
> 
> On 2.4.14-pre6aa1 it happily runs until:
> 
> munmap'ed 7317d000
> Loading data at 7317d000 for slot 2
> Load (/mnt/sdb/sven/chunk10) succeeded!
> mlocking slot 2, 7317d000
> mlocking at 7317d000 of size 1048576
> Connection to hera closed by remote host.
> Connection to hera closed.
> 
> Where is kills my ssh and other programs.  fills my /var/log/messages
> with:
> 
> Nov  2 11:29:07 ps2 kernel: __alloc_pages: 0-order allocation failed (gfp=0x1d2/0)
> Nov  2 11:29:07 ps2 syslogd: select: Cannot allocate memory
> Nov  2 11:29:07 ps2 kernel: __alloc_pages: 0-order allocation failed (gfp=0x1d2/0)
> Nov  2 11:29:07 ps2 kernel: __alloc_pages: 0-order allocation failed (gfp=0x1f0/0)
> Nov  2 11:29:07 ps2 last message repeated 2 times
> 
> a bunch of times.  Then doesn't free the mmaped memory until file
> system is unmounted.

Not freeing the memory is expected and normal.  The previously-mlocked file 
data remains cached in that memory, and even though it's not free, it's 
'easily freeable' so there's no smoking gun there.  The reason the memory is 
freed on umount is, there's no possibility that that file data can be 
referenced again and it makes sense to free it up immediately.

On the other hand, the 0-order failures and oom-kills indicate a genuine bug.

> It never starts going into swap.
> 
> 2.4.14-pre5aa1 does about the same.

--
Daniel

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Google's mm problem - not reproduced on 2.4.13
  2001-11-02 18:11           ` Daniel Phillips
@ 2001-11-02 18:48             ` Sven Heinicke
  2001-11-02 18:57               ` Daniel Phillips
  0 siblings, 1 reply; 37+ messages in thread
From: Sven Heinicke @ 2001-11-02 18:48 UTC (permalink / raw)
  To: linux-kernel; +Cc: Daniel Phillips, Ben Smith, Andrea Arcangeli, Rik van Riel


 > Not freeing the memory is expected and normal.  The previously-mlocked file 
 > data remains cached in that memory, and even though it's not free, it's 
 > 'easily freeable' so there's no smoking gun there.  The reason the memory is 
 > freed on umount is, there's no possibility that that file data can be 
 > referenced again and it makes sense to free it up immediately.

That cool and all, but how to I free up the memory w/o umounting the
partition?

Also, I just tried 2.4.14-pre7.  It acted the same way as 2.4.13 does,
requiring the reset key to continue.

	  Sven

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Google's mm problem - not reproduced on 2.4.13
  2001-11-02 18:48             ` Sven Heinicke
@ 2001-11-02 18:57               ` Daniel Phillips
  0 siblings, 0 replies; 37+ messages in thread
From: Daniel Phillips @ 2001-11-02 18:57 UTC (permalink / raw)
  To: Sven Heinicke, linux-kernel; +Cc: Ben Smith, Andrea Arcangeli, Rik van Riel

On November 2, 2001 07:48 pm, Sven Heinicke wrote:
>  > Not freeing the memory is expected and normal.  The previously-mlocked file 
>  > data remains cached in that memory, and even though it's not free, it's 
>  > 'easily freeable' so there's no smoking gun there.  The reason the memory is 
>  > freed on umount is, there's no possibility that that file data can be 
>  > referenced again and it makes sense to free it up immediately.
> 
> That cool and all, but how to I free up the memory w/o umounting the
> partition?

You don't, that's the mm's job.  It tries to do it at the last minute, when
it's sure the memory is needed for something more important.

> Also, I just tried 2.4.14-pre7.  It acted the same way as 2.4.13 does,
> requiring the reset key to continue.

--
Daniel

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Google's mm problem - not reproduced on 2.4.13
  2001-10-31 20:45   ` Andrea Arcangeli
  2001-10-31 21:03     ` Daniel Phillips
@ 2001-11-01  0:19     ` Daniel Phillips
  2001-11-01  0:29       ` Andrea Arcangeli
  2001-11-01  1:17       ` Ben Smith
  1 sibling, 2 replies; 37+ messages in thread
From: Daniel Phillips @ 2001-11-01  0:19 UTC (permalink / raw)
  To: Andrea Arcangeli, Ben Smith; +Cc: linux-kernel, Rik van Riel

On October 31, 2001 09:45 pm, Andrea Arcangeli wrote:
> On Wed, Oct 31, 2001 at 09:39:12PM +0100, Daniel Phillips wrote:
> > On October 31, 2001 07:06 pm, Daniel Phillips wrote:
> > > I just tried your test program with 2.4.13, 2 Gig, and it ran without 
> > > problems.  Could you try that over there and see if you get the same 
result?
> > > If it does run, the next move would be to check with 3.5 Gig.
> > 
> > Ben reports that his test with 2 Gig memory runs fine, as it does for me, 
but 
> > that it locks up tight with 3.5 Gig, requiring power cycle.  Since I only 
> > have 2 Gig here I can't reproduce that (yet).
> 
> are you sure it isn't an oom condition.

The way the test code works is, it keeps mlocking more blocks of memory until 
one of the mlocks fails, and then it does the rest of its work with that many 
blocks of memory.  It's hard to see how we could get a legitimate oom with 
that strategy.

> can you reproduce on
> 2.4.14pre5aa1? mainline (at least before pre6) could deadlock with too
> much mlocked memory.

OK, he tried it with pre5aa1:

ben> My test application gets killed (I believe by the oom handler). dmesg
ben> complains about a lot of 0-order allocation failures. For this test,
ben> I'm running with 2.4.14pre5aa1, 3.5gb of RAM, 2 PIII 1Ghz.

*Just in case* it's oom-related I've asked Ben to try it with one less than 
the maximum number of memory blocks he can allocate.

If it does turn out to be oom, it's still a bug, right?

--
Daniel

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Google's mm problem - not reproduced on 2.4.13
  2001-11-01  0:19     ` Daniel Phillips
@ 2001-11-01  0:29       ` Andrea Arcangeli
  2001-11-01  1:17       ` Ben Smith
  1 sibling, 0 replies; 37+ messages in thread
From: Andrea Arcangeli @ 2001-11-01  0:29 UTC (permalink / raw)
  To: Daniel Phillips; +Cc: Ben Smith, linux-kernel, Rik van Riel

On Thu, Nov 01, 2001 at 01:19:15AM +0100, Daniel Phillips wrote:
> If it does turn out to be oom, it's still a bug, right?

The testcase I checked a few weeks ago looked correct, so whatever it
is, it should be a kernel bug.

Andrea

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Google's mm problem - not reproduced on 2.4.13
  2001-11-01  0:19     ` Daniel Phillips
  2001-11-01  0:29       ` Andrea Arcangeli
@ 2001-11-01  1:17       ` Ben Smith
  2001-11-01  1:41         ` Rik van Riel
  1 sibling, 1 reply; 37+ messages in thread
From: Ben Smith @ 2001-11-01  1:17 UTC (permalink / raw)
  To: Daniel Phillips; +Cc: Andrea Arcangeli, linux-kernel, Rik van Riel

> *Just in case* it's oom-related I've asked Ben to try it with one less than 
> the maximum number of memory blocks he can allocate.


I've run this test with my 3.5G machine, 3 blocks instead of 4 blocks, 
and it has the same behavior (my app gets killed, 0-order allocation 
failures, and the system stays up.
  - Ben

Ben Smith
Google, Inc



^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Google's mm problem - not reproduced on 2.4.13
  2001-11-01  1:17       ` Ben Smith
@ 2001-11-01  1:41         ` Rik van Riel
  2001-11-01  1:55           ` Ben Smith
  0 siblings, 1 reply; 37+ messages in thread
From: Rik van Riel @ 2001-11-01  1:41 UTC (permalink / raw)
  To: Ben Smith; +Cc: Daniel Phillips, Andrea Arcangeli, linux-kernel

On Wed, 31 Oct 2001, Ben Smith wrote:

> > *Just in case* it's oom-related I've asked Ben to try it with one less than
> > the maximum number of memory blocks he can allocate.
>
> I've run this test with my 3.5G machine, 3 blocks instead of 4 blocks,
> and it has the same behavior (my app gets killed, 0-order allocation
> failures, and the system stays up.

If you still have swap free at the point where the process
gets killed, or if the memory is file-backed, then we are
positive it's a kernel bug.

regards,

Rik
-- 
DMCA, SSSCA, W3C?  Who cares?  http://thefreeworld.net/

http://www.surriel.com/		http://distro.conectiva.com/


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Google's mm problem - not reproduced on 2.4.13
  2001-11-01  1:41         ` Rik van Riel
@ 2001-11-01  1:55           ` Ben Smith
  2001-11-01  2:06             ` Andrea Arcangeli
  0 siblings, 1 reply; 37+ messages in thread
From: Ben Smith @ 2001-11-01  1:55 UTC (permalink / raw)
  To: Rik van Riel; +Cc: Daniel Phillips, Andrea Arcangeli, linux-kernel

>>>*Just in case* it's oom-related I've asked Ben to try it with one less than
>>>the maximum number of memory blocks he can allocate.
>>>
>>I've run this test with my 3.5G machine, 3 blocks instead of 4 blocks,
>>and it has the same behavior (my app gets killed, 0-order allocation
>>failures, and the system stays up.
>>
> 
> If you still have swap free at the point where the process
> gets killed, or if the memory is file-backed, then we are
> positive it's a kernel bug.

This machine is configured without a swap file. The memory is file backed, 

though (read-only mmap, followed by a mlock).

  - Ben

Ben Smith
Google, Inc




^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Google's mm problem - not reproduced on 2.4.13
  2001-11-01  1:55           ` Ben Smith
@ 2001-11-01  2:06             ` Andrea Arcangeli
  0 siblings, 0 replies; 37+ messages in thread
From: Andrea Arcangeli @ 2001-11-01  2:06 UTC (permalink / raw)
  To: Ben Smith; +Cc: Rik van Riel, Daniel Phillips, linux-kernel

On Wed, Oct 31, 2001 at 05:55:25PM -0800, Ben Smith wrote:
> >>>*Just in case* it's oom-related I've asked Ben to try it with one less than
> >>>the maximum number of memory blocks he can allocate.
> >>>
> >>I've run this test with my 3.5G machine, 3 blocks instead of 4 blocks,
> >>and it has the same behavior (my app gets killed, 0-order allocation
> >>failures, and the system stays up.
> >>
> > 
> > If you still have swap free at the point where the process
> > gets killed, or if the memory is file-backed, then we are
> > positive it's a kernel bug.
> 
> This machine is configured without a swap file. The memory is file backed, 

ok fine on this side. so again, what's happening is the equivalent of
mlock lefting those mappings locked. It seems the previous mlock is
forbidding the cache to be released. Otherwise I don't see why the
kernel shouldn't release the cache correctly. So it could be an mlock
bug in the kernel.

Andrea

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Google's mm problem - not reproduced on 2.4.13
  2001-10-31 20:39 ` Daniel Phillips
  2001-10-31 20:45   ` Andrea Arcangeli
@ 2001-10-31 20:48   ` Rik van Riel
  2001-10-31 21:04     ` Daniel Phillips
  1 sibling, 1 reply; 37+ messages in thread
From: Rik van Riel @ 2001-10-31 20:48 UTC (permalink / raw)
  To: Daniel Phillips; +Cc: linux-kernel, Andrea Arcangeli

On Wed, 31 Oct 2001, Daniel Phillips wrote:
> On October 31, 2001 07:06 pm, Daniel Phillips wrote:
> > I just tried your test program with 2.4.13, 2 Gig, and it ran without
> > problems.  Could you try that over there and see if you get the same result?
> > If it does run, the next move would be to check with 3.5 Gig.
>
> Ben reports that his test with 2 Gig memory runs fine, as it does for
> me, but that it locks up tight with 3.5 Gig, requiring power cycle.
> Since I only have 2 Gig here I can't reproduce that (yet).

Does it lock up if your low memory is reduced to 512 MB ?

Rik
-- 
DMCA, SSSCA, W3C?  Who cares?  http://thefreeworld.net/

http://www.surriel.com/		http://distro.conectiva.com/


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Google's mm problem - not reproduced on 2.4.13
  2001-10-31 20:48   ` Rik van Riel
@ 2001-10-31 21:04     ` Daniel Phillips
  2001-10-31 21:08       ` Rik van Riel
  0 siblings, 1 reply; 37+ messages in thread
From: Daniel Phillips @ 2001-10-31 21:04 UTC (permalink / raw)
  To: Rik van Riel; +Cc: linux-kernel, Andrea Arcangeli, Ben Smith

On October 31, 2001 09:48 pm, Rik van Riel wrote:
> On Wed, 31 Oct 2001, Daniel Phillips wrote:
> > On October 31, 2001 07:06 pm, Daniel Phillips wrote:
> > > I just tried your test program with 2.4.13, 2 Gig, and it ran without
> > > problems.  Could you try that over there and see if you get the same result?
> > > If it does run, the next move would be to check with 3.5 Gig.
> >
> > Ben reports that his test with 2 Gig memory runs fine, as it does for
> > me, but that it locks up tight with 3.5 Gig, requiring power cycle.
> > Since I only have 2 Gig here I can't reproduce that (yet).
> 
> Does it lock up if your low memory is reduced to 512 MB ?

Ben?

--
Daniel

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Google's mm problem - not reproduced on 2.4.13
  2001-10-31 21:04     ` Daniel Phillips
@ 2001-10-31 21:08       ` Rik van Riel
  0 siblings, 0 replies; 37+ messages in thread
From: Rik van Riel @ 2001-10-31 21:08 UTC (permalink / raw)
  To: Daniel Phillips; +Cc: linux-kernel, Andrea Arcangeli, Ben Smith

On Wed, 31 Oct 2001, Daniel Phillips wrote:
> On October 31, 2001 09:48 pm, Rik van Riel wrote:
> > On Wed, 31 Oct 2001, Daniel Phillips wrote:

> > > Ben reports that his test with 2 Gig memory runs fine, as it does for
> > > me, but that it locks up tight with 3.5 Gig, requiring power cycle.
> > > Since I only have 2 Gig here I can't reproduce that (yet).
> >
> > Does it lock up if your low memory is reduced to 512 MB ?
>
> Ben?

Nonono, I mean that if _you_ reduce low memory to 512MB
on your 2GB machine, maybe you can reproduce the problem
more easily.

If the Google people try this with larger machines, it'll
almost certainly make triggering the bug even easier ;)

regards,

Rik
-- 
DMCA, SSSCA, W3C?  Who cares?  http://thefreeworld.net/

http://www.surriel.com/		http://distro.conectiva.com/


^ permalink raw reply	[flat|nested] 37+ messages in thread

[parent not found: <Pine.LNX.4.33.0111021250560.20078-100000@penguin.transmeta.com>]

* Re: Google's mm problem - not reproduced on 2.4.13
       [not found] <Pine.LNX.4.33.0111021250560.20078-100000@penguin.transmeta.com>
@ 2001-11-02 21:13 ` Linus Torvalds
  2001-11-02 21:27   ` Stephan von Krawczynski
  0 siblings, 1 reply; 37+ messages in thread
From: Linus Torvalds @ 2001-11-02 21:13 UTC (permalink / raw)
  To: Kernel Mailing List; +Cc: Daniel Phillips

[ Slightly updated version of earlier private email ]

On Fri, 2 Nov 2001, Daniel Phillips wrote:
>
> Yes, it does various things on various vms.  On 2.4.9 it stays on the
> inactive list until free memory gets down to rock bottom, then most of it
> moves to the active list and the system reaches a steady state where it can
> operate, though with kswapd grabbing 99% CPU (two processor system), but the
> test does complete.  On the current kernel the it dies.

On the 2.4.9 kernel, the "active" list is completely and utterly misnamed.

We move random pages to the active list, for random reasons. One of the
random reasons we have is "this page is mapped". Which has nothing to do
with activeness. The "active" list might as well have been called
"random_list_two".

In the new VM, only _active_ page get moved to the active list. So the
mlocked pages will stay on the inactive list until somebody says they are
active. And right now nobody will ever say that they are active, because
we don't even scan the locked areas.

And the advantage of the non-random approach is that in the new VM, we can
_use_ the knowledge that the inactive list has filled up with mapped pages
to make a _useful_ decision: we decide that we need to start scanning the
VM tree and try to remove pages from the mappings.

Notice? No more "random decisions". We have a well-defined point where we
can say "Ok, our inactive list seems to be mostly mapped, so let's try to
unmap something".

In short, 2.4.9 handles the test because it does everything else wrong.

While 2.4.13 doesn't handle the test well, because the VM says "there's a
_lot_ of inactive mapped pages, I need to _do_ something about it". And
then vmscanning doesn't actually do anything.

Suggested patch appended.

> In the tests I did, it was about 1 gig out of 2.  I'm not sure how much
> memory is mlocked in the 3.5 Gig test the one that's failing, but it's
> certainly not anything like all of memory.  Really, we should be able to
> mlock 90%+ of memory without falling over.

Not a way in hell, for many reasons, and none of them have anything to do
with this particular problem.

The most _trivial_ reason is that if you lock more than 900MB of memory,
that locked area may well be all of the lowmem pages, and you're now
screwed forever. Dead, dead, dead.

(And I can come up with loads that do exactly the above: it's easy enough
to try to first allocate up all of highmem, and then do a mlock and try to
allocate up all of lowmem locked. It's even easier if you use loopback or
something that only wants to allocate lowmem in the first place).

In short, we MUST NOT mlock more than maybe 500MB _tops_ on intel. If we
ever do, our survival is pretty random, regardless of other VM issues.

The appended patch will should fix the unintentional problem, though.

		Linus

----
diff -u --recursive --new-file penguin/linux/mm/vmscan.c linux/mm/vmscan.c
--- penguin/linux/mm/vmscan.c	Thu Nov  1 17:59:12 2001
+++ linux/mm/vmscan.c	Fri Nov  2 13:10:58 2001
@@ -49,7 +49,7 @@
 	swp_entry_t entry;

 	/* Don't look at this pte if it's been accessed recently. */
-	if (ptep_test_and_clear_young(page_table)) {
+	if ((vma->vm_flags & VM_LOCKED) || ptep_test_and_clear_young(page_table)) {
 		mark_page_accessed(page);
 		return 0;
 	}
@@ -220,8 +220,8 @@
 	pgd_t *pgdir;
 	unsigned long end;

-	/* Don't swap out areas which are locked down */
-	if (vma->vm_flags & (VM_LOCKED|VM_RESERVED))
+	/* Don't swap out areas which are reserved */
+	if (vma->vm_flags & VM_RESERVED)
 		return count;

 	pgdir = pgd_offset(mm, address);

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Google's mm problem - not reproduced on 2.4.13
  2001-11-02 21:13 ` Linus Torvalds
@ 2001-11-02 21:27   ` Stephan von Krawczynski
  2001-11-03  0:16     ` Linus Torvalds
  0 siblings, 1 reply; 37+ messages in thread
From: Stephan von Krawczynski @ 2001-11-02 21:27 UTC (permalink / raw)
  To: linux-kernel; +Cc: phillips

On Fri, 2 Nov 2001 13:13:10 -0800 (PST) Linus Torvalds <torvalds@transmeta.com>
wrote:

> -	/* Don't swap out areas which are locked down */
> -	if (vma->vm_flags & (VM_LOCKED|VM_RESERVED))
> +	/* Don't swap out areas which are reserved */
> +	if (vma->vm_flags & VM_RESERVED)
>  		return count;

Although I agree what you said about differences of old and new VM, I believe
the above was not really what Ben intended to do by mlocking. I mean, you swap
them out right now, or not?

Regards,
Stephan


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: Google's mm problem - not reproduced on 2.4.13
  2001-11-02 21:27   ` Stephan von Krawczynski
@ 2001-11-03  0:16     ` Linus Torvalds
  0 siblings, 0 replies; 37+ messages in thread
From: Linus Torvalds @ 2001-11-03  0:16 UTC (permalink / raw)
  To: linux-kernel

In article <20011102222754.2366f1f5.skraw@ithnet.com>,
Stephan von Krawczynski  <skraw@ithnet.com> wrote:
>
>> -	/* Don't swap out areas which are locked down */
>> -	if (vma->vm_flags & (VM_LOCKED|VM_RESERVED))
>> +	/* Don't swap out areas which are reserved */
>> +	if (vma->vm_flags & VM_RESERVED)
>>  		return count;
>
>Although I agree what you said about differences of old and new VM, I believe
>the above was not really what Ben intended to do by mlocking. I mean, you swap
>them out right now, or not?

Not. See where I added the VM_LOCKED test - deep down in the page-out
code it will decide that a VM_LOCKED page is always accessed, and will
move it to the active list instead of swapping it out.

		Linus

^ permalink raw reply	[flat|nested] 37+ messages in thread

end of thread, other threads:[~2001-11-03 23:02 UTC | newest]

Thread overview: 37+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2001-10-31 18:06 Google's mm problem - not reproduced on 2.4.13 Daniel Phillips
2001-10-31 20:39 ` Daniel Phillips
2001-10-31 20:45   ` Andrea Arcangeli
2001-10-31 21:03     ` Daniel Phillips
2001-10-31 21:53       ` Andreas Dilger
2001-11-01  4:52         ` Daniel Phillips
2001-11-01 16:56           ` undefined reference in 2.2.19 build with Reiserfs (was: Google's mm problem - not reproduced on 2.4.13) Sven Heinicke
2001-11-01 22:39             ` Keith Owens
2001-10-31 22:12       ` Google's mm problem - not reproduced on 2.4.13 Ben Smith
2001-11-01  0:34         ` Andrea Arcangeli
2001-11-02 17:51         ` Sven Heinicke
2001-11-02 18:00           ` Andrea Arcangeli
2001-11-02 18:19             ` Daniel Phillips
2001-11-02 20:27               ` Linus Torvalds
2001-11-02 21:08                 ` Ben Smith
2001-11-02 21:20                   ` Linus Torvalds
2001-11-02 22:42                     ` Ben Smith
2001-11-02 23:15                       ` Daniel Phillips
2001-11-03 22:53                       ` Adaptec vs Symbios performance Stephan von Krawczynski
2001-11-03 23:01                         ` arjan
2001-11-02 21:12                 ` Google's mm problem - not reproduced on 2.4.13 Rik van Riel
     [not found]               ` <200111022027.fA2KRwe20006@penguin.transmeta.com>
2001-11-02 20:58                 ` Daniel Phillips
2001-11-02 18:11           ` Daniel Phillips
2001-11-02 18:48             ` Sven Heinicke
2001-11-02 18:57               ` Daniel Phillips
2001-11-01  0:19     ` Daniel Phillips
2001-11-01  0:29       ` Andrea Arcangeli
2001-11-01  1:17       ` Ben Smith
2001-11-01  1:41         ` Rik van Riel
2001-11-01  1:55           ` Ben Smith
2001-11-01  2:06             ` Andrea Arcangeli
2001-10-31 20:48   ` Rik van Riel
2001-10-31 21:04     ` Daniel Phillips
2001-10-31 21:08       ` Rik van Riel
     [not found] <Pine.LNX.4.33.0111021250560.20078-100000@penguin.transmeta.com>
2001-11-02 21:13 ` Linus Torvalds
2001-11-02 21:27   ` Stephan von Krawczynski
2001-11-03  0:16     ` Linus Torvalds

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox