* Google's mm problem - not reproduced on 2.4.13 @ 2001-10-31 18:06 Daniel Phillips 2001-10-31 20:39 ` Daniel Phillips 0 siblings, 1 reply; 37+ messages in thread From: Daniel Phillips @ 2001-10-31 18:06 UTC (permalink / raw) To: linux-kernel; +Cc: Rik van Riel, Andrea Arcangeli Good morning Ben, I just tried your test program with 2.4.13, 2 Gig, and it ran without problems. Could you try that over there and see if you get the same result? If it does run, the next move would be to check with 3.5 Gig. Regards, Daniel ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: Google's mm problem - not reproduced on 2.4.13 2001-10-31 18:06 Google's mm problem - not reproduced on 2.4.13 Daniel Phillips @ 2001-10-31 20:39 ` Daniel Phillips 2001-10-31 20:45 ` Andrea Arcangeli 2001-10-31 20:48 ` Rik van Riel 0 siblings, 2 replies; 37+ messages in thread From: Daniel Phillips @ 2001-10-31 20:39 UTC (permalink / raw) To: linux-kernel; +Cc: Rik van Riel, Andrea Arcangeli On October 31, 2001 07:06 pm, Daniel Phillips wrote: > I just tried your test program with 2.4.13, 2 Gig, and it ran without > problems. Could you try that over there and see if you get the same result? > If it does run, the next move would be to check with 3.5 Gig. Ben reports that his test with 2 Gig memory runs fine, as it does for me, but that it locks up tight with 3.5 Gig, requiring power cycle. Since I only have 2 Gig here I can't reproduce that (yet). -- Daniel ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: Google's mm problem - not reproduced on 2.4.13 2001-10-31 20:39 ` Daniel Phillips @ 2001-10-31 20:45 ` Andrea Arcangeli 2001-10-31 21:03 ` Daniel Phillips 2001-11-01 0:19 ` Daniel Phillips 2001-10-31 20:48 ` Rik van Riel 1 sibling, 2 replies; 37+ messages in thread From: Andrea Arcangeli @ 2001-10-31 20:45 UTC (permalink / raw) To: Daniel Phillips; +Cc: linux-kernel, Rik van Riel On Wed, Oct 31, 2001 at 09:39:12PM +0100, Daniel Phillips wrote: > On October 31, 2001 07:06 pm, Daniel Phillips wrote: > > I just tried your test program with 2.4.13, 2 Gig, and it ran without > > problems. Could you try that over there and see if you get the same result? > > If it does run, the next move would be to check with 3.5 Gig. > > Ben reports that his test with 2 Gig memory runs fine, as it does for me, but > that it locks up tight with 3.5 Gig, requiring power cycle. Since I only > have 2 Gig here I can't reproduce that (yet). are you sure it isn't an oom condition. can you reproduce on 2.4.14pre5aa1? mainline (at least before pre6) could deadlock with too much mlocked memory. Andrea ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: Google's mm problem - not reproduced on 2.4.13 2001-10-31 20:45 ` Andrea Arcangeli @ 2001-10-31 21:03 ` Daniel Phillips 2001-10-31 21:53 ` Andreas Dilger 2001-10-31 22:12 ` Google's mm problem - not reproduced on 2.4.13 Ben Smith 2001-11-01 0:19 ` Daniel Phillips 1 sibling, 2 replies; 37+ messages in thread From: Daniel Phillips @ 2001-10-31 21:03 UTC (permalink / raw) To: Andrea Arcangeli; +Cc: linux-kernel, Rik van Riel, Ben Smith On October 31, 2001 09:45 pm, Andrea Arcangeli wrote: > On Wed, Oct 31, 2001 at 09:39:12PM +0100, Daniel Phillips wrote: > > On October 31, 2001 07:06 pm, Daniel Phillips wrote: > > > I just tried your test program with 2.4.13, 2 Gig, and it ran without > > > problems. Could you try that over there and see if you get the same result? > > > If it does run, the next move would be to check with 3.5 Gig. > > > > Ben reports that his test with 2 Gig memory runs fine, as it does for me, but > > that it locks up tight with 3.5 Gig, requiring power cycle. Since I only > > have 2 Gig here I can't reproduce that (yet). > > are you sure it isn't an oom condition. can you reproduce on > 2.4.14pre5aa1? mainline (at least before pre6) could deadlock with too > much mlocked memory. I don't know, I can't reproduce it here, I don't have enough memory. Ben? -- Daniel ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: Google's mm problem - not reproduced on 2.4.13 2001-10-31 21:03 ` Daniel Phillips @ 2001-10-31 21:53 ` Andreas Dilger 2001-11-01 4:52 ` Daniel Phillips 2001-10-31 22:12 ` Google's mm problem - not reproduced on 2.4.13 Ben Smith 1 sibling, 1 reply; 37+ messages in thread From: Andreas Dilger @ 2001-10-31 21:53 UTC (permalink / raw) To: Daniel Phillips; +Cc: linux-kernel On Oct 31, 2001 22:03 +0100, Daniel Phillips wrote: > Ben reports that his test with 2 Gig memory runs fine, as it does for me, but > that it locks up tight with 3.5 Gig, requiring power cycle. Since I only > have 2 Gig here I can't reproduce that (yet). Sadly, I bought some memory yesterday, and it was only U$30 for 256MB DIMMs, so $120/GB if you have enough slots. Not that I'm suggesting you go out and but more memory Daniel, as you probably have your slots filled with 2GB, and larger sticks are still bit more expesive. The only thing that bugs me about the low memory price is that Windows XP recommends at least 128MB for a workable system. A year or two ago that would have been considered a bloated pig, but now they are giving away 128MB DIMMs with a purchase of XP. Sad, really. Maybe M$ is subsidizing the chipmakers to make RAM cheap so XP can run on peoples' computers ;-)? What else would you do with U$50 billion in cash (or whatever) that M$ has? Cheers, Andreas -- Andreas Dilger http://sourceforge.net/projects/ext2resize/ http://www-mddsp.enel.ucalgary.ca/People/adilger/ ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: Google's mm problem - not reproduced on 2.4.13 2001-10-31 21:53 ` Andreas Dilger @ 2001-11-01 4:52 ` Daniel Phillips 2001-11-01 16:56 ` undefined reference in 2.2.19 build with Reiserfs (was: Google's mm problem - not reproduced on 2.4.13) Sven Heinicke 0 siblings, 1 reply; 37+ messages in thread From: Daniel Phillips @ 2001-11-01 4:52 UTC (permalink / raw) To: Andreas Dilger; +Cc: linux-kernel On October 31, 2001 10:53 pm, Andreas Dilger wrote: > On Oct 31, 2001 22:03 +0100, Daniel Phillips wrote: > > Ben reports that his test with 2 Gig memory runs fine, as it does for me, but > > that it locks up tight with 3.5 Gig, requiring power cycle. Since I only > > have 2 Gig here I can't reproduce that (yet). > > Sadly, I bought some memory yesterday, and it was only U$30 for 256MB > DIMMs, so $120/GB if you have enough slots. Not that I'm suggesting > you go out and but more memory Daniel, as you probably have your slots > filled with 2GB, and larger sticks are still bit more expesive. You're not kidding. Just FYI, four 1GB sticks for this machine will set you back a kilobuck. (PC/133 Registered SDRAM 72-bit ECC, 168-pin gold-plated DIMM) -- Daniel ^ permalink raw reply [flat|nested] 37+ messages in thread
* undefined reference in 2.2.19 build with Reiserfs (was: Google's mm problem - not reproduced on 2.4.13) 2001-11-01 4:52 ` Daniel Phillips @ 2001-11-01 16:56 ` Sven Heinicke 2001-11-01 22:39 ` Keith Owens 0 siblings, 1 reply; 37+ messages in thread From: Sven Heinicke @ 2001-11-01 16:56 UTC (permalink / raw) To: linux-kernel Hi, We have a couple of Dells with 4G of memory. We have been experiencing the same google problems. My boss has asked me to roll one of them back to a 2.2 kernel, we will live with 2G of memory for now, while I help out with testing mmap problems. I am, however having problems compiling the 2.2.19 kernel with the linux-2.2.19-reiserfs-3.5.34-patch.bz2 patch. I get the following error: ld -m elf_i386 -T /home/sven/linux-build/linux-2.2.19-reiserfs/arch/i386/vmlinux.lds -e stext arch/i386/kernel/head.o arch/i386/kernel/init_task.o init/main.o init/version.o \ --start-group \ arch/i386/kernel/kernel.o arch/i386/mm/mm.o kernel/kernel.o mm/mm.o fs/fs.o ipc/ipc.o \ fs/filesystems.a \ net/network.a \ drivers/block/block.a drivers/char/char.o drivers/misc/misc.a drivers/net/net.a drivers/scsi/scsi.a drivers/pci/pci.a drivers/video/video.a \ /home/sven/linux-build/linux-2.2.19-reiserfs/arch/i386/lib/lib.a /home/sven/linux-build/linux-2.2.19-reiserfs/lib/lib.a /home/sven/linux-build/linux-2.2.19-reiserfs/arch/i386/lib/lib.a \ --end-group \ -o vmlinux fs/filesystems.a(reiserfs.o): In function `ip_check_balance': reiserfs.o(.text+0x9cc2): undefined reference to `memset' drivers/scsi/scsi.a(aic7xxx.o): In function `aic7xxx_load_seeprom': aic7xxx.o(.text+0x117ff): undefined reference to `memcpy' make: *** [vmlinux] Error 1 I tried patching the 2.2.20-pre12 patch but got the same (or similar) results. After I get this 2.2.19 system stable by boss says fixing the google bug is my "top priority". Unfortunely this will only like be the second time I dig into linux source code so I expect it will be mostly me testing other people patches. But I will try my best. Here is info from my 2.2.19 system as asked for in the REPORTING-BUGS file: This is a Red Hat 7.1 system. $ source scripts/ver_linux Linux ps1.web.nj.nec.com 2.4.9-marcelo-patch #10 SMP Wed Aug 22 15:13:48 EDT 2001 i686 unknown Gnu C 2.96 Gnu make 3.79.1 binutils 2.10.91.0.2 util-linux 2.11b modutils 2.4.2 e2fsprogs 1.19 reiserfsprogs 3.x.0f Linux C Library 2.2.2 Dynamic linker (ldd) 2.2.2 Procps 2.0.7 Net-tools 1.57 Console-tools 0.3.3 Sh-utils 2.0 Modules Loaded autofs eepro100 md the Marcelo Patch is from the list time I stuck my nose in the kernel with a himem patch. Checking my diff and the 2.4.13 kernel the stuff is nearly the same. $ cat /proc/cpuinfo processor : 0 vendor_id : GenuineIntel cpu family : 6 model : 8 model name : Pentium III (Coppermine) stepping : 6 cpu MHz : 993.400 cache size : 256 KB fdiv_bug : no hlt_bug : no f00f_bug : no coma_bug : no fpu : yes fpu_exception : yes cpuid level : 2 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 mmx fxsr sse bogomips : 1979.18 processor : 1 vendor_id : GenuineIntel cpu family : 6 model : 8 model name : Pentium III (Coppermine) stepping : 6 cpu MHz : 993.400 cache size : 256 KB fdiv_bug : no hlt_bug : no f00f_bug : no coma_bug : no fpu : yes fpu_exception : yes cpuid level : 2 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 mmx fxsr sse bogomips : 1985.74 $ cat /proc/scsi/scsi Attached devices: Host: scsi0 Channel: 00 Id: 00 Lun: 00 Vendor: SEAGATE Model: ST173404LC Rev: 0004 Type: Direct-Access ANSI SCSI revision: 03 Host: scsi0 Channel: 00 Id: 01 Lun: 00 Vendor: SEAGATE Model: ST173404LC Rev: 0004 Type: Direct-Access ANSI SCSI revision: 03 Host: scsi0 Channel: 00 Id: 02 Lun: 00 Vendor: SEAGATE Model: ST173404LC Rev: 0004 Type: Direct-Access ANSI SCSI revision: 03 Host: scsi0 Channel: 00 Id: 03 Lun: 00 Vendor: SEAGATE Model: ST173404LC Rev: 0004 Type: Direct-Access ANSI SCSI revision: 03 Host: scsi0 Channel: 00 Id: 04 Lun: 00 Vendor: SEAGATE Model: ST173404LC Rev: 0004 Type: Direct-Access ANSI SCSI revision: 03 Host: scsi0 Channel: 00 Id: 05 Lun: 00 Vendor: SEAGATE Model: ST173404LC Rev: 0004 Type: Direct-Access ANSI SCSI revision: 03 Host: scsi0 Channel: 00 Id: 06 Lun: 00 Vendor: DELL Model: 1x6 U2W SCSI BP Rev: 5.35 Type: Processor ANSI SCSI revision: 02 Host: scsi2 Channel: 00 Id: 05 Lun: 00 Vendor: NEC Model: CD-ROM DRIVE:466 Rev: 1.06 Type: CD-ROM ANSI SCSI revision: 02 bash-2.04$ lsmod Module Size Used by autofs 11920 1 (autoclean) eepro100 17184 1 (autoclean) md 43616 0 (unused) ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: undefined reference in 2.2.19 build with Reiserfs (was: Google's mm problem - not reproduced on 2.4.13) 2001-11-01 16:56 ` undefined reference in 2.2.19 build with Reiserfs (was: Google's mm problem - not reproduced on 2.4.13) Sven Heinicke @ 2001-11-01 22:39 ` Keith Owens 0 siblings, 0 replies; 37+ messages in thread From: Keith Owens @ 2001-11-01 22:39 UTC (permalink / raw) To: Sven Heinicke; +Cc: linux-kernel On Thu, 1 Nov 2001 11:56:04 -0500 (EST), Sven Heinicke <sven@research.nj.nec.com> wrote: >fs/filesystems.a(reiserfs.o): In function `ip_check_balance': >reiserfs.o(.text+0x9cc2): undefined reference to `memset' >drivers/scsi/scsi.a(aic7xxx.o): In function `aic7xxx_load_seeprom': >aic7xxx.o(.text+0x117ff): undefined reference to `memcpy' The aic7xxx reference to memcpy is a gcc feature. If you do an assignment of a complete structure then gcc may convert that into a call to memcpy(). Alas gcc does the conversion using the "standard" version of memcpy, not the "optimized by cpp" version that the kernel uses. Try this patch Index: 19.1/drivers/scsi/aic7xxx.c --- 19.1/drivers/scsi/aic7xxx.c Tue, 13 Feb 2001 08:26:08 +1100 kaos (linux-2.2/d/b/43_aic7xxx.c 1.1.1.3.2.1.3.1.1.3 644) +++ 19.1(w)/drivers/scsi/aic7xxx.c Fri, 02 Nov 2001 09:36:49 +1100 kaos (linux-2.2/d/b/43_aic7xxx.c 1.1.1.3.2.1.3.1.1.3 644) @@ -9190,7 +9190,7 @@ aic7xxx_load_seeprom(struct aic7xxx_host p->flags |= AHC_TERM_ENB_SE_LOW | AHC_TERM_ENB_SE_HIGH; } } - p->sc = *sc; + memcpy(&(p->sc), sc, sizeof(p->sc)); } p->discenable = 0; Cannot help with the reiserfs problem, the code is not in the pristine 2.2.19 tree. ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: Google's mm problem - not reproduced on 2.4.13 2001-10-31 21:03 ` Daniel Phillips 2001-10-31 21:53 ` Andreas Dilger @ 2001-10-31 22:12 ` Ben Smith 2001-11-01 0:34 ` Andrea Arcangeli 2001-11-02 17:51 ` Sven Heinicke 1 sibling, 2 replies; 37+ messages in thread From: Ben Smith @ 2001-10-31 22:12 UTC (permalink / raw) To: Daniel Phillips; +Cc: Andrea Arcangeli, linux-kernel, Rik van Riel > On October 31, 2001 09:45 pm, Andrea Arcangeli wrote: > >>On Wed, Oct 31, 2001 at 09:39:12PM +0100, Daniel Phillips wrote: >> >>>On October 31, 2001 07:06 pm, Daniel Phillips wrote: >>> >>>>I just tried your test program with 2.4.13, 2 Gig, and it ran >>>>without problems. Could you try that over there and see if you >>>>get the same result? If it does run, the next move would be to >>>>check with 3.5 Gig. >>>> >>>Ben reports that his test with 2 Gig memory runs fine, as it does >>>for me, but that it locks up tight with 3.5 Gig, requiring power >>>cycle. Since I only have 2 Gig here I can't reproduce that (yet). >>> >>are you sure it isn't an oom condition. can you reproduce on >>2.4.14pre5aa1? mainline (at least before pre6) could deadlock with >>too much mlocked memory. >> > > I don't know, I can't reproduce it here, I don't have enough memory. > Ben? My test application gets killed (I believe by the oom handler). dmesg complains about a lot of 0-order allocation failures. For this test, I'm running with 2.4.14pre5aa1, 3.5gb of RAM, 2 PIII 1Ghz. - Ben Ben Smith Google, Inc ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: Google's mm problem - not reproduced on 2.4.13 2001-10-31 22:12 ` Google's mm problem - not reproduced on 2.4.13 Ben Smith @ 2001-11-01 0:34 ` Andrea Arcangeli 2001-11-02 17:51 ` Sven Heinicke 1 sibling, 0 replies; 37+ messages in thread From: Andrea Arcangeli @ 2001-11-01 0:34 UTC (permalink / raw) To: Ben Smith; +Cc: Daniel Phillips, linux-kernel, Rik van Riel On Wed, Oct 31, 2001 at 02:12:00PM -0800, Ben Smith wrote: > My test application gets killed (I believe by the oom handler). dmesg > complains about a lot of 0-order allocation failures. For this test, > I'm running with 2.4.14pre5aa1, 3.5gb of RAM, 2 PIII 1Ghz. Interesting, now we need to find out if the problem is the allocator in 2.4.14pre5aa1 that fails too early by mistake, or if this is a true oom condition. I tend to think it's a true oom condition since mainline deadlocked under the same workload where -aa correctly killed the task. Can you provide also a 'vmstat 1' trace of the last 20/30 seconds before the task gets killed? A true oom condition could be caused by a memleak in mlock or something like that (or of course it could be a bug in the userspace testcase, but I checked the testcase a few weeks ago and I didn't found anything wrong in it). Andrea ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: Google's mm problem - not reproduced on 2.4.13 2001-10-31 22:12 ` Google's mm problem - not reproduced on 2.4.13 Ben Smith 2001-11-01 0:34 ` Andrea Arcangeli @ 2001-11-02 17:51 ` Sven Heinicke 2001-11-02 18:00 ` Andrea Arcangeli 2001-11-02 18:11 ` Daniel Phillips 1 sibling, 2 replies; 37+ messages in thread From: Sven Heinicke @ 2001-11-02 17:51 UTC (permalink / raw) To: linux-kernel; +Cc: Daniel Phillips, Ben Smith, Andrea Arcangeli, Rik van Riel Ben Smith writes: > > On October 31, 2001 09:45 pm, Andrea Arcangeli wrote: > > > >>On Wed, Oct 31, 2001 at 09:39:12PM +0100, Daniel Phillips wrote: > >> > >>>On October 31, 2001 07:06 pm, Daniel Phillips wrote: > >>> > >>>>I just tried your test program with 2.4.13, 2 Gig, and it ran > >>>>without problems. Could you try that over there and see if you > >>>>get the same result? If it does run, the next move would be to > >>>>check with 3.5 Gig. > >>>> > >>>Ben reports that his test with 2 Gig memory runs fine, as it does > >>>for me, but that it locks up tight with 3.5 Gig, requiring power > >>>cycle. Since I only have 2 Gig here I can't reproduce that (yet). > >>> > >>are you sure it isn't an oom condition. can you reproduce on > >>2.4.14pre5aa1? mainline (at least before pre6) could deadlock with > >>too much mlocked memory. > >> > > > > I don't know, I can't reproduce it here, I don't have enough memory. > > Ben? > > My test application gets killed (I believe by the oom handler). dmesg > complains about a lot of 0-order allocation failures. For this test, > I'm running with 2.4.14pre5aa1, 3.5gb of RAM, 2 PIII 1Ghz. > - Ben > > Ben Smith > Google, Inc > This is a System with 4G of memory and regular swap. With 2 Pentium III 1Ghz processors. On 2.4.14-pre6aa1 it happily runs until: munmap'ed 7317d000 Loading data at 7317d000 for slot 2 Load (/mnt/sdb/sven/chunk10) succeeded! mlocking slot 2, 7317d000 mlocking at 7317d000 of size 1048576 Connection to hera closed by remote host. Connection to hera closed. Where is kills my ssh and other programs. fills my /var/log/messages with: Nov 2 11:29:07 ps2 kernel: __alloc_pages: 0-order allocation failed (gfp=0x1d2/0) Nov 2 11:29:07 ps2 syslogd: select: Cannot allocate memory Nov 2 11:29:07 ps2 kernel: __alloc_pages: 0-order allocation failed (gfp=0x1d2/0) Nov 2 11:29:07 ps2 kernel: __alloc_pages: 0-order allocation failed (gfp=0x1f0/0) Nov 2 11:29:07 ps2 last message repeated 2 times a bunch of times. Then doesn't free the mmaped memory until file system is unmounted. It never starts going into swap. 2.4.14-pre5aa1 does about the same. Sven ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: Google's mm problem - not reproduced on 2.4.13 2001-11-02 17:51 ` Sven Heinicke @ 2001-11-02 18:00 ` Andrea Arcangeli 2001-11-02 18:19 ` Daniel Phillips 2001-11-02 18:11 ` Daniel Phillips 1 sibling, 1 reply; 37+ messages in thread From: Andrea Arcangeli @ 2001-11-02 18:00 UTC (permalink / raw) To: Sven Heinicke; +Cc: linux-kernel, Daniel Phillips, Ben Smith, Rik van Riel On Fri, Nov 02, 2001 at 12:51:09PM -0500, Sven Heinicke wrote: > a bunch of times. Then doesn't free the mmaped memory until file > system is unmounted. It never starts going into swap. thanks for testing. This matches the idea that those pages doesn't want to be unmapped for whatever reason (and because there's an mlock in our way at the moment I'd tend to point my finger in that direction rather than into the vm direction). I'll look more closely into this testcase shortly. Andrea ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: Google's mm problem - not reproduced on 2.4.13 2001-11-02 18:00 ` Andrea Arcangeli @ 2001-11-02 18:19 ` Daniel Phillips 2001-11-02 20:27 ` Linus Torvalds [not found] ` <200111022027.fA2KRwe20006@penguin.transmeta.com> 0 siblings, 2 replies; 37+ messages in thread From: Daniel Phillips @ 2001-11-02 18:19 UTC (permalink / raw) To: Andrea Arcangeli, Sven Heinicke; +Cc: linux-kernel, Ben Smith, Rik van Riel On November 2, 2001 07:00 pm, Andrea Arcangeli wrote: > On Fri, Nov 02, 2001 at 12:51:09PM -0500, Sven Heinicke wrote: > > a bunch of times. Then doesn't free the mmaped memory until file > > system is unmounted. It never starts going into swap. > > thanks for testing. This matches the idea that those pages doesn't want > to be unmapped for whatever reason (and because there's an mlock in our > way at the moment I'd tend to point my finger in that direction rather > than into the vm direction). I'll look more closely into this testcase > shortly. The mlock handling looks dead simple: vmscan.c 227 if (vma->vm_flags & (VM_LOCKED|VM_RESERVED)) 228 return count; It's hard to see how that could be wrong. Plus, this test program does run under 2.4.9, it just uses way too much CPU on that kernel. So I'd say mm bug. -- Daniel ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: Google's mm problem - not reproduced on 2.4.13 2001-11-02 18:19 ` Daniel Phillips @ 2001-11-02 20:27 ` Linus Torvalds 2001-11-02 21:08 ` Ben Smith 2001-11-02 21:12 ` Google's mm problem - not reproduced on 2.4.13 Rik van Riel [not found] ` <200111022027.fA2KRwe20006@penguin.transmeta.com> 1 sibling, 2 replies; 37+ messages in thread From: Linus Torvalds @ 2001-11-02 20:27 UTC (permalink / raw) To: linux-kernel In article <20011102181758Z16039-4784+420@humbolt.nl.linux.org>, Daniel Phillips <phillips@bonn-fries.net> wrote: > >It's hard to see how that could be wrong. Plus, this test program does run >under 2.4.9, it just uses way too much CPU on that kernel. So I'd say mm >bug. So how much memory is mlocked? The locked memory will stay in the inactive list (it won't even ever be activated, because we don't bother even scanning the mapped locked regions), and the inactive list fills up with pages that are completely worthless. And the kernel will decide that because most of the unfreeable pages are mapped, it needs to do VM scanning, which obviously doesn't help. Why _does_ this thing do mlock, anyway? What's the point? And how much does it try to lock? If root wants to shoot himself in the head by mlocking all of memory, that's not a VM problem, that's a stupid administrator problem. Linus ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: Google's mm problem - not reproduced on 2.4.13 2001-11-02 20:27 ` Linus Torvalds @ 2001-11-02 21:08 ` Ben Smith 2001-11-02 21:20 ` Linus Torvalds 2001-11-02 21:12 ` Google's mm problem - not reproduced on 2.4.13 Rik van Riel 1 sibling, 1 reply; 37+ messages in thread From: Ben Smith @ 2001-11-02 21:08 UTC (permalink / raw) To: Linus Torvalds; +Cc: linux-kernel > So how much memory is mlocked? In the 3.5G case, we lock 4 blocks (4 * 427683520 bytes, or 1.631M). There is code in the kernel that prevents more than 1/2 of all physical pages from being mlocked: mlock.c:215-218: (in do_mlock) /* we may lock at most half of physical memory... */ /* (this check is pretty bogus, but doesn't hurt) */ if (locked > num_physpages/2) goto out; For 2.2 we were have a patch that increases this to 90% or 60M, but we don't use this patch on 2.4 yet. > Why _does_ this thing do mlock, anyway? What's the point? And how much > does it try to lock? Latency. We know exactly what data should remain in memory, so we're trying to prevent the vm from paging out the wrong data. It makes a huge difference in performance. - Ben Ben Smith Google, Inc. ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: Google's mm problem - not reproduced on 2.4.13 2001-11-02 21:08 ` Ben Smith @ 2001-11-02 21:20 ` Linus Torvalds 2001-11-02 22:42 ` Ben Smith 0 siblings, 1 reply; 37+ messages in thread From: Linus Torvalds @ 2001-11-02 21:20 UTC (permalink / raw) To: linux-kernel In article <3BE30B3D.1080505@google.com>, Ben Smith <ben@google.com> wrote: > >For 2.2 we were have a patch that increases this to 90% or 60M, but we >don't use this patch on 2.4 yet. Well, you'll also deadlock your machine if you happen to lock down the lowmemory area on x86. Sounds like a _bad_ idea. Anyway, I posted a suggested patch that should fix the behaviour, but it doesn't fix the fundamental problem with locking the wrong kinds of pages (ie you're definitely on your own if you happen to lock down most of the low 1GB of an intel machine). >Latency. We know exactly what data should remain in memory, so we're >trying to prevent the vm from paging out the wrong data. It makes a huge >difference in performance. It would be interesting to hear whether that is equally true in the new VM that doesn't necessarily page stuff out unless it can show that the memory pressure is actually from VM mappings. How big is your mlock area during real load? Still the "max the kernel will allow"? Or is that just a benchmark/test kind of thing? Linus ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: Google's mm problem - not reproduced on 2.4.13 2001-11-02 21:20 ` Linus Torvalds @ 2001-11-02 22:42 ` Ben Smith 2001-11-02 23:15 ` Daniel Phillips 2001-11-03 22:53 ` Adaptec vs Symbios performance Stephan von Krawczynski 0 siblings, 2 replies; 37+ messages in thread From: Ben Smith @ 2001-11-02 22:42 UTC (permalink / raw) To: linux-kernel > Anyway, I posted a suggested patch that should fix the behaviour, but it > doesn't fix the fundamental problem with locking the wrong kinds of > pages (ie you're definitely on your own if you happen to lock down most > of the low 1GB of an intel machine). I've tried the patch you sent and it doesn't help. I applied the patch to 2.4.13-pre7 and it hung the machine in the same way (ctrl-alt-del didn't work). The last few lines of vmstat before the machine hung look like this: 0 1 0 0 133444 5132 3367312 0 0 31196 0 1121 2123 0 6 94 0 1 0 0 63036 5216 3435920 0 0 34338 14 1219 2272 0 5 95 2 0 1 0 6156 1828 3494904 0 0 31268 0 1130 2198 0 23 77 1 0 1 0 3596 864 3498488 0 0 2720 16 1640 1068 0 88 12 > It would be interesting to hear whether that is equally true in the new > VM that doesn't necessarily page stuff out unless it can show that the > memory pressure is actually from VM mappings. > > How big is your mlock area during real load? Still the "max the kernel > will allow"? Or is that just a benchmark/test kind of thing? I haven't had a chance to try my real app yet, but my test application is a good simulation of what the real program does, minus any of the accessing of the data that it maps. Since it's the only application running, and for performance reasons we'd need all of our data in memory, we map the "max the kernel will allow". As another note, I've re-written my test application to use madvise instead of mlock, on a suggestion from Andrea. It also doesn't work. For 2.4.13, after running for a while, my test app hangs, using one CPU, and kswapd consumes the other CPU. I was eventually able to kill my test app. I've also re-written my test app to use anonymous mmap, followed by a mlock and read()'s. This actually does work without problems, but doesn't really do what we want for other reasons. - Ben Ben Smith Google, Inc. ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: Google's mm problem - not reproduced on 2.4.13 2001-11-02 22:42 ` Ben Smith @ 2001-11-02 23:15 ` Daniel Phillips 2001-11-03 22:53 ` Adaptec vs Symbios performance Stephan von Krawczynski 1 sibling, 0 replies; 37+ messages in thread From: Daniel Phillips @ 2001-11-02 23:15 UTC (permalink / raw) To: Andrea Arcangeli; +Cc: Ben Smith, linux-kernel On November 2, 2001 11:42 pm, Ben Smith wrote: > As another note, I've re-written my test application to use madvise > instead of mlock, on a suggestion from Andrea. It also doesn't work. For > 2.4.13, after running for a while, my test app hangs, using one CPU, and > kswapd consumes the other CPU. I was eventually able to kill my test app. OK, while there may be room for debate over whether the mlock problem is a bug there's no question with madvise. The program still doesn't work if you replace the mlocks with madvises (except for the mlock that's used to estimate memory size). -- Daniel ^ permalink raw reply [flat|nested] 37+ messages in thread
* Adaptec vs Symbios performance 2001-11-02 22:42 ` Ben Smith 2001-11-02 23:15 ` Daniel Phillips @ 2001-11-03 22:53 ` Stephan von Krawczynski 2001-11-03 23:01 ` arjan 1 sibling, 1 reply; 37+ messages in thread From: Stephan von Krawczynski @ 2001-11-03 22:53 UTC (permalink / raw) To: linux-kernel; +Cc: groudier Hello Justin, hello Gerard I am looking currently for reasons for bad behaviour of aic7xxx driver in an shared interrupt setup and general not-nice behaviour of the driver regarding multi-tasking environment. Here is what I found in the code: /* * SCSI controller interrupt handler. */ void ahc_linux_isr(int irq, void *dev_id, struct pt_regs * regs) { struct ahc_softc *ahc; struct ahc_cmd *acmd; u_long flags; ahc = (struct ahc_softc *) dev_id; ahc_lock(ahc, &flags); ahc_intr(ahc); /* * It would be nice to run the device queues from a * bottom half handler, but as there is no way to * dynamically register one, we'll have to postpone * that until we get integrated into the kernel. */ ahc_linux_run_device_queues(ahc); acmd = TAILQ_FIRST(&ahc->platform_data->completeq); TAILQ_INIT(&ahc->platform_data->completeq); ahc_unlock(ahc, &flags); if (acmd != NULL) ahc_linux_run_complete_queue(ahc, acmd); } This is nice. I cannot read the complete code around it (it is derived from aic7xxx_linux.c) but if I understand the naming and comments correct, some workload is done inside the hardware interrupt (which shouldn't), which would very much match my tests showing bad overall performance behaviour. Obviously this code is old (read the comment) and needs reworking. Comments? Regards, Stephan ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: Adaptec vs Symbios performance 2001-11-03 22:53 ` Adaptec vs Symbios performance Stephan von Krawczynski @ 2001-11-03 23:01 ` arjan 0 siblings, 0 replies; 37+ messages in thread From: arjan @ 2001-11-03 23:01 UTC (permalink / raw) To: Stephan von Krawczynski; +Cc: linux-kernel In article <200111032253.XAA20342@webserver.ithnet.com> you wrote: > Hello Justin, hello Gerard > > I am looking currently for reasons for bad behaviour of aic7xxx driver > in an shared interrupt setup and general not-nice behaviour of the > driver regarding multi-tasking environment. > Here is what I found in the code: > * It would be nice to run the device queues from a > * bottom half handler, but as there is no way to > * dynamically register one, we'll have to postpone > * that until we get integrated into the kernel. > */ sounds like a good tasklet candidate...... ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: Google's mm problem - not reproduced on 2.4.13 2001-11-02 20:27 ` Linus Torvalds 2001-11-02 21:08 ` Ben Smith @ 2001-11-02 21:12 ` Rik van Riel 1 sibling, 0 replies; 37+ messages in thread From: Rik van Riel @ 2001-11-02 21:12 UTC (permalink / raw) To: Linus Torvalds; +Cc: linux-kernel On Fri, 2 Nov 2001, Linus Torvalds wrote: > If root wants to shoot himself in the head by mlocking all of memory, > that's not a VM problem, that's a stupid administrator problem. The kernel limits the amount of mlock()d memory to 50% of RAM, so we _should_ be ok. (yes, this limit is per process, but daniel only has one process running anyway) regards, Rik -- DMCA, SSSCA, W3C? Who cares? http://thefreeworld.net/ http://www.surriel.com/ http://distro.conectiva.com/ ^ permalink raw reply [flat|nested] 37+ messages in thread
[parent not found: <200111022027.fA2KRwe20006@penguin.transmeta.com>]
* Re: Google's mm problem - not reproduced on 2.4.13 [not found] ` <200111022027.fA2KRwe20006@penguin.transmeta.com> @ 2001-11-02 20:58 ` Daniel Phillips 0 siblings, 0 replies; 37+ messages in thread From: Daniel Phillips @ 2001-11-02 20:58 UTC (permalink / raw) To: Sven Heinicke, linux-kernel; +Cc: Ben Smith, Andrea Arcangeli, Rik van Riel On November 2, 2001 09:27 pm, Linus Torvalds wrote: > In article <20011102181758Z16039-4784+420@humbolt.nl.linux.org>, > Daniel Phillips <phillips@bonn-fries.net> wrote: > > > >It's hard to see how that could be wrong. Plus, this test program does > >run under 2.4.9, it just uses way too much CPU on that kernel. So I'd say > >mm bug. > > So how much memory is mlocked? I'm not sure exactly, I didn't run the test. I *think* it's just over 50% of physical memory. > The locked memory will stay in the inactive list (it won't even ever be > activated, because we don't bother even scanning the mapped locked > regions), and the inactive list fills up with pages that are completely > worthless. Yes, it does various things on various vms. On 2.4.9 it stays on the inactive list until free memory gets down to rock bottom, then most of it moves to the active list and the system reaches a steady state where it can operate, though with kswapd grabbing 99% CPU (two processor system), but the test does complete. On the current kernel the it dies. > And the kernel will decide that because most of the unfreeable pages are > mapped, it needs to do VM scanning, which obviously doesn't help. > > Why _does_ this thing do mlock, anyway? What's the point? And how much > does it try to lock? It's how the google database engine works, and keeps latency down, by mapping big database files into memory. I didn't get more information than that on the application. > If root wants to shoot himself in the head by mlocking all of memory, > that's not a VM problem, that's a stupid administrator problem. In the tests I did, it was about 1 gig out of 2. I'm not sure how much memory is mlocked in the 3.5 Gig test the one that's failing, but it's certainly not anything like all of memory. Really, we should be able to mlock 90%+ of memory without falling over. -- Daniel ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: Google's mm problem - not reproduced on 2.4.13 2001-11-02 17:51 ` Sven Heinicke 2001-11-02 18:00 ` Andrea Arcangeli @ 2001-11-02 18:11 ` Daniel Phillips 2001-11-02 18:48 ` Sven Heinicke 1 sibling, 1 reply; 37+ messages in thread From: Daniel Phillips @ 2001-11-02 18:11 UTC (permalink / raw) To: Sven Heinicke, linux-kernel; +Cc: Ben Smith, Andrea Arcangeli, Rik van Riel On November 2, 2001 06:51 pm, Sven Heinicke wrote: > Ben Smith writes: > > > On October 31, 2001 09:45 pm, Andrea Arcangeli wrote: > > > > > >>On Wed, Oct 31, 2001 at 09:39:12PM +0100, Daniel Phillips wrote: > > >> > > >>>On October 31, 2001 07:06 pm, Daniel Phillips wrote: > > >>> > > >>>>I just tried your test program with 2.4.13, 2 Gig, and it ran > > >>>>without problems. Could you try that over there and see if you > > >>>>get the same result? If it does run, the next move would be to > > >>>>check with 3.5 Gig. > > >>>> > > >>>Ben reports that his test with 2 Gig memory runs fine, as it does > > >>>for me, but that it locks up tight with 3.5 Gig, requiring power > > >>>cycle. Since I only have 2 Gig here I can't reproduce that (yet). > > >>> > > >>are you sure it isn't an oom condition. can you reproduce on > > >>2.4.14pre5aa1? mainline (at least before pre6) could deadlock with > > >>too much mlocked memory. > > >> > > > > > > I don't know, I can't reproduce it here, I don't have enough memory. > > > Ben? > > > > My test application gets killed (I believe by the oom handler). dmesg > > complains about a lot of 0-order allocation failures. For this test, > > I'm running with 2.4.14pre5aa1, 3.5gb of RAM, 2 PIII 1Ghz. > > - Ben > > > > Ben Smith > > Google, Inc > > > > > This is a System with 4G of memory and regular swap. With 2 Pentium > III 1Ghz processors. > > On 2.4.14-pre6aa1 it happily runs until: > > munmap'ed 7317d000 > Loading data at 7317d000 for slot 2 > Load (/mnt/sdb/sven/chunk10) succeeded! > mlocking slot 2, 7317d000 > mlocking at 7317d000 of size 1048576 > Connection to hera closed by remote host. > Connection to hera closed. > > Where is kills my ssh and other programs. fills my /var/log/messages > with: > > Nov 2 11:29:07 ps2 kernel: __alloc_pages: 0-order allocation failed (gfp=0x1d2/0) > Nov 2 11:29:07 ps2 syslogd: select: Cannot allocate memory > Nov 2 11:29:07 ps2 kernel: __alloc_pages: 0-order allocation failed (gfp=0x1d2/0) > Nov 2 11:29:07 ps2 kernel: __alloc_pages: 0-order allocation failed (gfp=0x1f0/0) > Nov 2 11:29:07 ps2 last message repeated 2 times > > a bunch of times. Then doesn't free the mmaped memory until file > system is unmounted. Not freeing the memory is expected and normal. The previously-mlocked file data remains cached in that memory, and even though it's not free, it's 'easily freeable' so there's no smoking gun there. The reason the memory is freed on umount is, there's no possibility that that file data can be referenced again and it makes sense to free it up immediately. On the other hand, the 0-order failures and oom-kills indicate a genuine bug. > It never starts going into swap. > > 2.4.14-pre5aa1 does about the same. -- Daniel ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: Google's mm problem - not reproduced on 2.4.13 2001-11-02 18:11 ` Daniel Phillips @ 2001-11-02 18:48 ` Sven Heinicke 2001-11-02 18:57 ` Daniel Phillips 0 siblings, 1 reply; 37+ messages in thread From: Sven Heinicke @ 2001-11-02 18:48 UTC (permalink / raw) To: linux-kernel; +Cc: Daniel Phillips, Ben Smith, Andrea Arcangeli, Rik van Riel > Not freeing the memory is expected and normal. The previously-mlocked file > data remains cached in that memory, and even though it's not free, it's > 'easily freeable' so there's no smoking gun there. The reason the memory is > freed on umount is, there's no possibility that that file data can be > referenced again and it makes sense to free it up immediately. That cool and all, but how to I free up the memory w/o umounting the partition? Also, I just tried 2.4.14-pre7. It acted the same way as 2.4.13 does, requiring the reset key to continue. Sven ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: Google's mm problem - not reproduced on 2.4.13 2001-11-02 18:48 ` Sven Heinicke @ 2001-11-02 18:57 ` Daniel Phillips 0 siblings, 0 replies; 37+ messages in thread From: Daniel Phillips @ 2001-11-02 18:57 UTC (permalink / raw) To: Sven Heinicke, linux-kernel; +Cc: Ben Smith, Andrea Arcangeli, Rik van Riel On November 2, 2001 07:48 pm, Sven Heinicke wrote: > > Not freeing the memory is expected and normal. The previously-mlocked file > > data remains cached in that memory, and even though it's not free, it's > > 'easily freeable' so there's no smoking gun there. The reason the memory is > > freed on umount is, there's no possibility that that file data can be > > referenced again and it makes sense to free it up immediately. > > That cool and all, but how to I free up the memory w/o umounting the > partition? You don't, that's the mm's job. It tries to do it at the last minute, when it's sure the memory is needed for something more important. > Also, I just tried 2.4.14-pre7. It acted the same way as 2.4.13 does, > requiring the reset key to continue. -- Daniel ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: Google's mm problem - not reproduced on 2.4.13 2001-10-31 20:45 ` Andrea Arcangeli 2001-10-31 21:03 ` Daniel Phillips @ 2001-11-01 0:19 ` Daniel Phillips 2001-11-01 0:29 ` Andrea Arcangeli 2001-11-01 1:17 ` Ben Smith 1 sibling, 2 replies; 37+ messages in thread From: Daniel Phillips @ 2001-11-01 0:19 UTC (permalink / raw) To: Andrea Arcangeli, Ben Smith; +Cc: linux-kernel, Rik van Riel On October 31, 2001 09:45 pm, Andrea Arcangeli wrote: > On Wed, Oct 31, 2001 at 09:39:12PM +0100, Daniel Phillips wrote: > > On October 31, 2001 07:06 pm, Daniel Phillips wrote: > > > I just tried your test program with 2.4.13, 2 Gig, and it ran without > > > problems. Could you try that over there and see if you get the same result? > > > If it does run, the next move would be to check with 3.5 Gig. > > > > Ben reports that his test with 2 Gig memory runs fine, as it does for me, but > > that it locks up tight with 3.5 Gig, requiring power cycle. Since I only > > have 2 Gig here I can't reproduce that (yet). > > are you sure it isn't an oom condition. The way the test code works is, it keeps mlocking more blocks of memory until one of the mlocks fails, and then it does the rest of its work with that many blocks of memory. It's hard to see how we could get a legitimate oom with that strategy. > can you reproduce on > 2.4.14pre5aa1? mainline (at least before pre6) could deadlock with too > much mlocked memory. OK, he tried it with pre5aa1: ben> My test application gets killed (I believe by the oom handler). dmesg ben> complains about a lot of 0-order allocation failures. For this test, ben> I'm running with 2.4.14pre5aa1, 3.5gb of RAM, 2 PIII 1Ghz. *Just in case* it's oom-related I've asked Ben to try it with one less than the maximum number of memory blocks he can allocate. If it does turn out to be oom, it's still a bug, right? -- Daniel ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: Google's mm problem - not reproduced on 2.4.13 2001-11-01 0:19 ` Daniel Phillips @ 2001-11-01 0:29 ` Andrea Arcangeli 2001-11-01 1:17 ` Ben Smith 1 sibling, 0 replies; 37+ messages in thread From: Andrea Arcangeli @ 2001-11-01 0:29 UTC (permalink / raw) To: Daniel Phillips; +Cc: Ben Smith, linux-kernel, Rik van Riel On Thu, Nov 01, 2001 at 01:19:15AM +0100, Daniel Phillips wrote: > If it does turn out to be oom, it's still a bug, right? The testcase I checked a few weeks ago looked correct, so whatever it is, it should be a kernel bug. Andrea ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: Google's mm problem - not reproduced on 2.4.13 2001-11-01 0:19 ` Daniel Phillips 2001-11-01 0:29 ` Andrea Arcangeli @ 2001-11-01 1:17 ` Ben Smith 2001-11-01 1:41 ` Rik van Riel 1 sibling, 1 reply; 37+ messages in thread From: Ben Smith @ 2001-11-01 1:17 UTC (permalink / raw) To: Daniel Phillips; +Cc: Andrea Arcangeli, linux-kernel, Rik van Riel > *Just in case* it's oom-related I've asked Ben to try it with one less than > the maximum number of memory blocks he can allocate. I've run this test with my 3.5G machine, 3 blocks instead of 4 blocks, and it has the same behavior (my app gets killed, 0-order allocation failures, and the system stays up. - Ben Ben Smith Google, Inc ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: Google's mm problem - not reproduced on 2.4.13 2001-11-01 1:17 ` Ben Smith @ 2001-11-01 1:41 ` Rik van Riel 2001-11-01 1:55 ` Ben Smith 0 siblings, 1 reply; 37+ messages in thread From: Rik van Riel @ 2001-11-01 1:41 UTC (permalink / raw) To: Ben Smith; +Cc: Daniel Phillips, Andrea Arcangeli, linux-kernel On Wed, 31 Oct 2001, Ben Smith wrote: > > *Just in case* it's oom-related I've asked Ben to try it with one less than > > the maximum number of memory blocks he can allocate. > > I've run this test with my 3.5G machine, 3 blocks instead of 4 blocks, > and it has the same behavior (my app gets killed, 0-order allocation > failures, and the system stays up. If you still have swap free at the point where the process gets killed, or if the memory is file-backed, then we are positive it's a kernel bug. regards, Rik -- DMCA, SSSCA, W3C? Who cares? http://thefreeworld.net/ http://www.surriel.com/ http://distro.conectiva.com/ ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: Google's mm problem - not reproduced on 2.4.13 2001-11-01 1:41 ` Rik van Riel @ 2001-11-01 1:55 ` Ben Smith 2001-11-01 2:06 ` Andrea Arcangeli 0 siblings, 1 reply; 37+ messages in thread From: Ben Smith @ 2001-11-01 1:55 UTC (permalink / raw) To: Rik van Riel; +Cc: Daniel Phillips, Andrea Arcangeli, linux-kernel >>>*Just in case* it's oom-related I've asked Ben to try it with one less than >>>the maximum number of memory blocks he can allocate. >>> >>I've run this test with my 3.5G machine, 3 blocks instead of 4 blocks, >>and it has the same behavior (my app gets killed, 0-order allocation >>failures, and the system stays up. >> > > If you still have swap free at the point where the process > gets killed, or if the memory is file-backed, then we are > positive it's a kernel bug. This machine is configured without a swap file. The memory is file backed, though (read-only mmap, followed by a mlock). - Ben Ben Smith Google, Inc ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: Google's mm problem - not reproduced on 2.4.13 2001-11-01 1:55 ` Ben Smith @ 2001-11-01 2:06 ` Andrea Arcangeli 0 siblings, 0 replies; 37+ messages in thread From: Andrea Arcangeli @ 2001-11-01 2:06 UTC (permalink / raw) To: Ben Smith; +Cc: Rik van Riel, Daniel Phillips, linux-kernel On Wed, Oct 31, 2001 at 05:55:25PM -0800, Ben Smith wrote: > >>>*Just in case* it's oom-related I've asked Ben to try it with one less than > >>>the maximum number of memory blocks he can allocate. > >>> > >>I've run this test with my 3.5G machine, 3 blocks instead of 4 blocks, > >>and it has the same behavior (my app gets killed, 0-order allocation > >>failures, and the system stays up. > >> > > > > If you still have swap free at the point where the process > > gets killed, or if the memory is file-backed, then we are > > positive it's a kernel bug. > > This machine is configured without a swap file. The memory is file backed, ok fine on this side. so again, what's happening is the equivalent of mlock lefting those mappings locked. It seems the previous mlock is forbidding the cache to be released. Otherwise I don't see why the kernel shouldn't release the cache correctly. So it could be an mlock bug in the kernel. Andrea ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: Google's mm problem - not reproduced on 2.4.13 2001-10-31 20:39 ` Daniel Phillips 2001-10-31 20:45 ` Andrea Arcangeli @ 2001-10-31 20:48 ` Rik van Riel 2001-10-31 21:04 ` Daniel Phillips 1 sibling, 1 reply; 37+ messages in thread From: Rik van Riel @ 2001-10-31 20:48 UTC (permalink / raw) To: Daniel Phillips; +Cc: linux-kernel, Andrea Arcangeli On Wed, 31 Oct 2001, Daniel Phillips wrote: > On October 31, 2001 07:06 pm, Daniel Phillips wrote: > > I just tried your test program with 2.4.13, 2 Gig, and it ran without > > problems. Could you try that over there and see if you get the same result? > > If it does run, the next move would be to check with 3.5 Gig. > > Ben reports that his test with 2 Gig memory runs fine, as it does for > me, but that it locks up tight with 3.5 Gig, requiring power cycle. > Since I only have 2 Gig here I can't reproduce that (yet). Does it lock up if your low memory is reduced to 512 MB ? Rik -- DMCA, SSSCA, W3C? Who cares? http://thefreeworld.net/ http://www.surriel.com/ http://distro.conectiva.com/ ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: Google's mm problem - not reproduced on 2.4.13 2001-10-31 20:48 ` Rik van Riel @ 2001-10-31 21:04 ` Daniel Phillips 2001-10-31 21:08 ` Rik van Riel 0 siblings, 1 reply; 37+ messages in thread From: Daniel Phillips @ 2001-10-31 21:04 UTC (permalink / raw) To: Rik van Riel; +Cc: linux-kernel, Andrea Arcangeli, Ben Smith On October 31, 2001 09:48 pm, Rik van Riel wrote: > On Wed, 31 Oct 2001, Daniel Phillips wrote: > > On October 31, 2001 07:06 pm, Daniel Phillips wrote: > > > I just tried your test program with 2.4.13, 2 Gig, and it ran without > > > problems. Could you try that over there and see if you get the same result? > > > If it does run, the next move would be to check with 3.5 Gig. > > > > Ben reports that his test with 2 Gig memory runs fine, as it does for > > me, but that it locks up tight with 3.5 Gig, requiring power cycle. > > Since I only have 2 Gig here I can't reproduce that (yet). > > Does it lock up if your low memory is reduced to 512 MB ? Ben? -- Daniel ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: Google's mm problem - not reproduced on 2.4.13 2001-10-31 21:04 ` Daniel Phillips @ 2001-10-31 21:08 ` Rik van Riel 0 siblings, 0 replies; 37+ messages in thread From: Rik van Riel @ 2001-10-31 21:08 UTC (permalink / raw) To: Daniel Phillips; +Cc: linux-kernel, Andrea Arcangeli, Ben Smith On Wed, 31 Oct 2001, Daniel Phillips wrote: > On October 31, 2001 09:48 pm, Rik van Riel wrote: > > On Wed, 31 Oct 2001, Daniel Phillips wrote: > > > Ben reports that his test with 2 Gig memory runs fine, as it does for > > > me, but that it locks up tight with 3.5 Gig, requiring power cycle. > > > Since I only have 2 Gig here I can't reproduce that (yet). > > > > Does it lock up if your low memory is reduced to 512 MB ? > > Ben? Nonono, I mean that if _you_ reduce low memory to 512MB on your 2GB machine, maybe you can reproduce the problem more easily. If the Google people try this with larger machines, it'll almost certainly make triggering the bug even easier ;) regards, Rik -- DMCA, SSSCA, W3C? Who cares? http://thefreeworld.net/ http://www.surriel.com/ http://distro.conectiva.com/ ^ permalink raw reply [flat|nested] 37+ messages in thread
[parent not found: <Pine.LNX.4.33.0111021250560.20078-100000@penguin.transmeta.com>]
* Re: Google's mm problem - not reproduced on 2.4.13 [not found] <Pine.LNX.4.33.0111021250560.20078-100000@penguin.transmeta.com> @ 2001-11-02 21:13 ` Linus Torvalds 2001-11-02 21:27 ` Stephan von Krawczynski 0 siblings, 1 reply; 37+ messages in thread From: Linus Torvalds @ 2001-11-02 21:13 UTC (permalink / raw) To: Kernel Mailing List; +Cc: Daniel Phillips [ Slightly updated version of earlier private email ] On Fri, 2 Nov 2001, Daniel Phillips wrote: > > Yes, it does various things on various vms. On 2.4.9 it stays on the > inactive list until free memory gets down to rock bottom, then most of it > moves to the active list and the system reaches a steady state where it can > operate, though with kswapd grabbing 99% CPU (two processor system), but the > test does complete. On the current kernel the it dies. On the 2.4.9 kernel, the "active" list is completely and utterly misnamed. We move random pages to the active list, for random reasons. One of the random reasons we have is "this page is mapped". Which has nothing to do with activeness. The "active" list might as well have been called "random_list_two". In the new VM, only _active_ page get moved to the active list. So the mlocked pages will stay on the inactive list until somebody says they are active. And right now nobody will ever say that they are active, because we don't even scan the locked areas. And the advantage of the non-random approach is that in the new VM, we can _use_ the knowledge that the inactive list has filled up with mapped pages to make a _useful_ decision: we decide that we need to start scanning the VM tree and try to remove pages from the mappings. Notice? No more "random decisions". We have a well-defined point where we can say "Ok, our inactive list seems to be mostly mapped, so let's try to unmap something". In short, 2.4.9 handles the test because it does everything else wrong. While 2.4.13 doesn't handle the test well, because the VM says "there's a _lot_ of inactive mapped pages, I need to _do_ something about it". And then vmscanning doesn't actually do anything. Suggested patch appended. > In the tests I did, it was about 1 gig out of 2. I'm not sure how much > memory is mlocked in the 3.5 Gig test the one that's failing, but it's > certainly not anything like all of memory. Really, we should be able to > mlock 90%+ of memory without falling over. Not a way in hell, for many reasons, and none of them have anything to do with this particular problem. The most _trivial_ reason is that if you lock more than 900MB of memory, that locked area may well be all of the lowmem pages, and you're now screwed forever. Dead, dead, dead. (And I can come up with loads that do exactly the above: it's easy enough to try to first allocate up all of highmem, and then do a mlock and try to allocate up all of lowmem locked. It's even easier if you use loopback or something that only wants to allocate lowmem in the first place). In short, we MUST NOT mlock more than maybe 500MB _tops_ on intel. If we ever do, our survival is pretty random, regardless of other VM issues. The appended patch will should fix the unintentional problem, though. Linus ---- diff -u --recursive --new-file penguin/linux/mm/vmscan.c linux/mm/vmscan.c --- penguin/linux/mm/vmscan.c Thu Nov 1 17:59:12 2001 +++ linux/mm/vmscan.c Fri Nov 2 13:10:58 2001 @@ -49,7 +49,7 @@ swp_entry_t entry; /* Don't look at this pte if it's been accessed recently. */ - if (ptep_test_and_clear_young(page_table)) { + if ((vma->vm_flags & VM_LOCKED) || ptep_test_and_clear_young(page_table)) { mark_page_accessed(page); return 0; } @@ -220,8 +220,8 @@ pgd_t *pgdir; unsigned long end; - /* Don't swap out areas which are locked down */ - if (vma->vm_flags & (VM_LOCKED|VM_RESERVED)) + /* Don't swap out areas which are reserved */ + if (vma->vm_flags & VM_RESERVED) return count; pgdir = pgd_offset(mm, address); ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: Google's mm problem - not reproduced on 2.4.13 2001-11-02 21:13 ` Linus Torvalds @ 2001-11-02 21:27 ` Stephan von Krawczynski 2001-11-03 0:16 ` Linus Torvalds 0 siblings, 1 reply; 37+ messages in thread From: Stephan von Krawczynski @ 2001-11-02 21:27 UTC (permalink / raw) To: linux-kernel; +Cc: phillips On Fri, 2 Nov 2001 13:13:10 -0800 (PST) Linus Torvalds <torvalds@transmeta.com> wrote: > - /* Don't swap out areas which are locked down */ > - if (vma->vm_flags & (VM_LOCKED|VM_RESERVED)) > + /* Don't swap out areas which are reserved */ > + if (vma->vm_flags & VM_RESERVED) > return count; Although I agree what you said about differences of old and new VM, I believe the above was not really what Ben intended to do by mlocking. I mean, you swap them out right now, or not? Regards, Stephan ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: Google's mm problem - not reproduced on 2.4.13 2001-11-02 21:27 ` Stephan von Krawczynski @ 2001-11-03 0:16 ` Linus Torvalds 0 siblings, 0 replies; 37+ messages in thread From: Linus Torvalds @ 2001-11-03 0:16 UTC (permalink / raw) To: linux-kernel In article <20011102222754.2366f1f5.skraw@ithnet.com>, Stephan von Krawczynski <skraw@ithnet.com> wrote: > >> - /* Don't swap out areas which are locked down */ >> - if (vma->vm_flags & (VM_LOCKED|VM_RESERVED)) >> + /* Don't swap out areas which are reserved */ >> + if (vma->vm_flags & VM_RESERVED) >> return count; > >Although I agree what you said about differences of old and new VM, I believe >the above was not really what Ben intended to do by mlocking. I mean, you swap >them out right now, or not? Not. See where I added the VM_LOCKED test - deep down in the page-out code it will decide that a VM_LOCKED page is always accessed, and will move it to the active list instead of swapping it out. Linus ^ permalink raw reply [flat|nested] 37+ messages in thread
end of thread, other threads:[~2001-11-03 23:02 UTC | newest]
Thread overview: 37+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2001-10-31 18:06 Google's mm problem - not reproduced on 2.4.13 Daniel Phillips
2001-10-31 20:39 ` Daniel Phillips
2001-10-31 20:45 ` Andrea Arcangeli
2001-10-31 21:03 ` Daniel Phillips
2001-10-31 21:53 ` Andreas Dilger
2001-11-01 4:52 ` Daniel Phillips
2001-11-01 16:56 ` undefined reference in 2.2.19 build with Reiserfs (was: Google's mm problem - not reproduced on 2.4.13) Sven Heinicke
2001-11-01 22:39 ` Keith Owens
2001-10-31 22:12 ` Google's mm problem - not reproduced on 2.4.13 Ben Smith
2001-11-01 0:34 ` Andrea Arcangeli
2001-11-02 17:51 ` Sven Heinicke
2001-11-02 18:00 ` Andrea Arcangeli
2001-11-02 18:19 ` Daniel Phillips
2001-11-02 20:27 ` Linus Torvalds
2001-11-02 21:08 ` Ben Smith
2001-11-02 21:20 ` Linus Torvalds
2001-11-02 22:42 ` Ben Smith
2001-11-02 23:15 ` Daniel Phillips
2001-11-03 22:53 ` Adaptec vs Symbios performance Stephan von Krawczynski
2001-11-03 23:01 ` arjan
2001-11-02 21:12 ` Google's mm problem - not reproduced on 2.4.13 Rik van Riel
[not found] ` <200111022027.fA2KRwe20006@penguin.transmeta.com>
2001-11-02 20:58 ` Daniel Phillips
2001-11-02 18:11 ` Daniel Phillips
2001-11-02 18:48 ` Sven Heinicke
2001-11-02 18:57 ` Daniel Phillips
2001-11-01 0:19 ` Daniel Phillips
2001-11-01 0:29 ` Andrea Arcangeli
2001-11-01 1:17 ` Ben Smith
2001-11-01 1:41 ` Rik van Riel
2001-11-01 1:55 ` Ben Smith
2001-11-01 2:06 ` Andrea Arcangeli
2001-10-31 20:48 ` Rik van Riel
2001-10-31 21:04 ` Daniel Phillips
2001-10-31 21:08 ` Rik van Riel
[not found] <Pine.LNX.4.33.0111021250560.20078-100000@penguin.transmeta.com>
2001-11-02 21:13 ` Linus Torvalds
2001-11-02 21:27 ` Stephan von Krawczynski
2001-11-03 0:16 ` Linus Torvalds
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox