* [2.4.17/18pre] VM and swap - it's really unusable
@ 2001-12-28 20:16 Andreas Hartmann
2001-12-28 20:32 ` Rik van Riel
` (5 more replies)
0 siblings, 6 replies; 49+ messages in thread
From: Andreas Hartmann @ 2001-12-28 20:16 UTC (permalink / raw)
To: Kernel-Mailingliste
Hello all,
Again, I did a rsync-operation as described in
"[2.4.17rc1] Swapping" MID <3C1F4014.2010705@athlon.maya.org>.
This time, the kernel had a swappartition which was about 200MB. As the
swap-partition was fully used, the kernel killed all processes of knode.
Nearly 50% of RAM had been used for buffers at this moment. Why is there
so much memory used for buffers?
I know I repeat it, but please:
Fix the VM-management in kernel 2.4.x. It's unusable. Believe
me! As comparison: kernel 2.2.19 didn't need nearly any swap for
the same operation!
Please consider that I'm using 512 MB of RAM. This should, or better:
must be enough to do the rsync-operation nearly without any swapping -
kernel 2.2.19 does it!
The performance of kernel 2.4.18pre1 is very poor, which is no surprise,
because the machine swaps nearly nonstop.
Regards,
Andreas Hartmann
^ permalink raw reply [flat|nested] 49+ messages in thread* Re: [2.4.17/18pre] VM and swap - it's really unusable 2001-12-28 20:16 [2.4.17/18pre] VM and swap - it's really unusable Andreas Hartmann @ 2001-12-28 20:32 ` Rik van Riel [not found] ` <3C2CD9EC.1D6C798E@zip.com.au> ` (4 subsequent siblings) 5 siblings, 0 replies; 49+ messages in thread From: Rik van Riel @ 2001-12-28 20:32 UTC (permalink / raw) To: Andreas Hartmann; +Cc: linux-kernel On Fri, 28 Dec 2001, Andreas Hartmann wrote: > Fix the VM-management in kernel 2.4.x. It's unusable. Believe > me! As comparison: kernel 2.2.19 didn't need nearly any swap for > the same operation! If you feel adventurous you can try my rmap based VM, the latest version is on: http://surriel.com/patches/2.4/2.4.17-rmap-8 This VM should behave a bit better (it does on my machines), but isn't yet bug-free enough to be used on production machines. Also, the changes it introduces are, IMHO, too big for a stable kernel series ;) regards, Rik -- DMCA, SSSCA, W3C? Who cares? http://thefreeworld.net/ http://www.surriel.com/ http://distro.conectiva.com/ ^ permalink raw reply [flat|nested] 49+ messages in thread
[parent not found: <3C2CD9EC.1D6C798E@zip.com.au>]
* Re: [2.4.17/18pre] VM and swap - it's really unusable [not found] ` <3C2CD9EC.1D6C798E@zip.com.au> @ 2001-12-28 21:26 ` Andreas Hartmann 0 siblings, 0 replies; 49+ messages in thread From: Andreas Hartmann @ 2001-12-28 21:26 UTC (permalink / raw) To: Andrew Morton; +Cc: Kernel-Mailingliste Andrew Morton wrote: > Andreas Hartmann wrote: > >>Hello all, >> >>Again, I did a rsync-operation as described in >>"[2.4.17rc1] Swapping" MID <3C1F4014.2010705@athlon.maya.org>. >> >>This time, the kernel had a swappartition which was about 200MB. As the >>swap-partition was fully used, the kernel killed all processes of knode. >>Nearly 50% of RAM had been used for buffers at this moment. Why is there >>so much memory used for buffers? >> > > It's very strange. The large amount of buffercache usage is to > be expected from statting 20 gigs worth of files, but the kernel > should (and normally does) free up that memory on demand. > > Which filesystem(s) are you using? > > Are you using NFS/NBD/SMBFS or anything like that? > Basically, I'm using NFS and reiserfs. But I didn't use any file on NFS since the last reboot - and the NFS-shares haven't been mounted. There are 2 IDE-Harddisks in this machine: hda: WDC WD205AA, ATA DISK drive (40079088 sectors (20520 MB) w/2048KiB cache, CHS=2494/255/63, UDMA(66)) hdb: WDC WD450AA-00BAA0, ATA DISK drive (87930864 sectors (45021 MB) w/2048KiB Cache, CHS=5473/255/63, UDMA(66)) On hda, I have got 7 partitions (plus one little "boot"-partition, which isn't mounted and a 200MB swap partition). On hdb, I have got 12 partitions and one more, meanwhile 1GB swap partition. All partitions are formated with reiserfs. Regards, Andreas Hartmann ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [2.4.17/18pre] VM and swap - it's really unusable 2001-12-28 20:16 [2.4.17/18pre] VM and swap - it's really unusable Andreas Hartmann 2001-12-28 20:32 ` Rik van Riel [not found] ` <3C2CD9EC.1D6C798E@zip.com.au> @ 2001-12-29 0:30 ` Alan Cox 2001-12-29 13:14 ` Andreas Hartmann ` (2 subsequent siblings) 5 siblings, 0 replies; 49+ messages in thread From: Alan Cox @ 2001-12-29 0:30 UTC (permalink / raw) To: Andreas Hartmann; +Cc: Kernel-Mailingliste > Fix the VM-management in kernel 2.4.x. It's unusable. Believe > me! As comparison: kernel 2.2.19 didn't need nearly any swap for > the same operation! > The performance of kernel 2.4.18pre1 is very poor, which is no surprise, > because the machine swaps nearly nonstop. Does the 2.4.9 Red Hat kernel (if yoiu are using RH) or 2.4.12-ac8 show the same problem ? ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [2.4.17/18pre] VM and swap - it's really unusable 2001-12-28 20:16 [2.4.17/18pre] VM and swap - it's really unusable Andreas Hartmann ` (2 preceding siblings ...) 2001-12-29 0:30 ` Alan Cox @ 2001-12-29 13:14 ` Andreas Hartmann 2001-12-29 15:15 ` Andrea Arcangeli 2002-01-03 20:23 ` Ken Brownfield 5 siblings, 0 replies; 49+ messages in thread From: Andreas Hartmann @ 2001-12-29 13:14 UTC (permalink / raw) To: Kernel-Mailingliste Andreas Hartmann wrote: > Hello all, > > Again, I did a rsync-operation as described in > "[2.4.17rc1] Swapping" MID <3C1F4014.2010705@athlon.maya.org>. > Some other examples: I just did a cp -Rd linux-2.4.16 linux-2.4.17 (with object-files). Before starting this action, I had about 120 MB of free RAM. During copying - I did nothing else meanwhile, there was 2MB swap used - and 12 MB of RAM were free. The biggest part of memory was used for caching - what is ok. After copying, only 10 MB of memory have been given free again. There have been 490MB of RAM used now (nearly most for caching). Outgoing from this situation, I started another little cp-action: cp -Rd linux-2.4.18pre1 linux-2.4.test (again including object files). Result: the swap usage stayed nearly constant, neverthless there have been 6 accesses to swap. Now, I deleted the linux-2.4.test-directory with rm -R linux-2.4.test This action was very fast (approximately 1s). Afterwards, a big part of the cache memory has been given free (about 100MB). Now, 122MB of RAM have been free again. Next example (running after the last): SuSE run-crons have been running. This means: -> updatedb -> sort -> frcode -> find -> mandb 47MB swap used, 2/3 of memory is used for buffers (Don't forget: I've got 512MB of RAM) and about 30MB of RAM are free. My observation: Why does the kernel swap to get free memory for caching / buffering? I can't see any sense in this action. Wouldn't it be better to shrink the cashing / buffering-RAM to the amount of memory, which is obviously free? Swapping should be principally used, if the RAM ends for real memory (memory, which is used for running applications). First of all, the memory-usage of cache and buffers should be reduced before starting to swap IMHO. Or would it be possible, to implement more than one swapping strategy, which could be configured during make menuconfig? This would give the user the chance to find the best swapping strategy for his purpose. Regards, Andreas Hartmann ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [2.4.17/18pre] VM and swap - it's really unusable 2001-12-28 20:16 [2.4.17/18pre] VM and swap - it's really unusable Andreas Hartmann ` (3 preceding siblings ...) 2001-12-29 13:14 ` Andreas Hartmann @ 2001-12-29 15:15 ` Andrea Arcangeli 2002-01-03 20:23 ` Ken Brownfield 5 siblings, 0 replies; 49+ messages in thread From: Andrea Arcangeli @ 2001-12-29 15:15 UTC (permalink / raw) To: Andreas Hartmann; +Cc: Kernel-Mailingliste On Fri, Dec 28, 2001 at 09:16:38PM +0100, Andreas Hartmann wrote: > Hello all, > > Again, I did a rsync-operation as described in > "[2.4.17rc1] Swapping" MID <3C1F4014.2010705@athlon.maya.org>. > > This time, the kernel had a swappartition which was about 200MB. As the > swap-partition was fully used, the kernel killed all processes of knode. > Nearly 50% of RAM had been used for buffers at this moment. Why is there > so much memory used for buffers? > > I know I repeat it, but please: > > Fix the VM-management in kernel 2.4.x. It's unusable. Believe > me! As comparison: kernel 2.2.19 didn't need nearly any swap for > the same operation! > > Please consider that I'm using 512 MB of RAM. This should, or better: > must be enough to do the rsync-operation nearly without any swapping - > kernel 2.2.19 does it! > > The performance of kernel 2.4.18pre1 is very poor, which is no surprise, > because the machine swaps nearly nonstop. please try to reproduce on 2.4.17rc2aa2, thanks. ftp://ftp.us.kernel.org/pub/linux/kernel/people/andrea/kernels/v2.4/2.4.17rc2aa2.bz2 Andrea ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [2.4.17/18pre] VM and swap - it's really unusable 2001-12-28 20:16 [2.4.17/18pre] VM and swap - it's really unusable Andreas Hartmann ` (4 preceding siblings ...) 2001-12-29 15:15 ` Andrea Arcangeli @ 2002-01-03 20:23 ` Ken Brownfield 2002-01-03 20:50 ` Rik van Riel ` (3 more replies) 5 siblings, 4 replies; 49+ messages in thread From: Ken Brownfield @ 2002-01-03 20:23 UTC (permalink / raw) To: Andreas Hartmann; +Cc: Kernel-Mailingliste Unfortunately, I lost the response that basically said "2.4 looks stable to me", but let me count the ways in which I agree with Andreas' sentiment: A) VM has major issues 1) about a dozen recent OOPS reports in VM code 2) VM falls down on large-memory machines with a high inode count (slocate/updatedb, i/dcache) 3) Memory allocation failures and OOM triggers even though caches remain full. 4) Other bugs fixed in -aa and others B) Live- and dead-locks that I'm seeing on all 2.4 production machines > 2.4.9, possibly related to A. But how will I ever find out? C) IO-APIC code that requires noapic on any and all SMP machines that I've ever run on. I don't have anything against anyone here -- I think everyone is doing a fine job. It's an issue of acceptance of the problem and focus. These issues are all showstoppers for me, and while I don't represent the 90% of the Linux market that is UP desktops, IMHO future work on the kernel will be degraded by basic functionality that continues to cause problems. I think seeing some of Andrea's and Andrew's et al patches actually *happen* would be a good thing, since 2.4 kernels are decidedly not ready for production here. I am forced to apply 26 distinct patch sets to my kernels, and I am NOT the right person to make these judgements. Which is why I was interested in an LKML summary source, though I haven't yet had a chance to catch up on that thread of comment. Having a glitch in the radeon driver is one thing; having persistent, fatal, and reproducable failures in universal kernel code is entirely another. -- Ken. brownfld@irridia.com On Fri, Dec 28, 2001 at 09:16:38PM +0100, Andreas Hartmann wrote: | Hello all, | | Again, I did a rsync-operation as described in | "[2.4.17rc1] Swapping" MID <3C1F4014.2010705@athlon.maya.org>. | | This time, the kernel had a swappartition which was about 200MB. As the | swap-partition was fully used, the kernel killed all processes of knode. | Nearly 50% of RAM had been used for buffers at this moment. Why is there | so much memory used for buffers? | | I know I repeat it, but please: | | Fix the VM-management in kernel 2.4.x. It's unusable. Believe | me! As comparison: kernel 2.2.19 didn't need nearly any swap for | the same operation! | | Please consider that I'm using 512 MB of RAM. This should, or better: | must be enough to do the rsync-operation nearly without any swapping - | kernel 2.2.19 does it! | | The performance of kernel 2.4.18pre1 is very poor, which is no surprise, | because the machine swaps nearly nonstop. | | | Regards, | Andreas Hartmann | | - | To unsubscribe from this list: send the line "unsubscribe linux-kernel" in | the body of a message to majordomo@vger.kernel.org | More majordomo info at http://vger.kernel.org/majordomo-info.html | Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [2.4.17/18pre] VM and swap - it's really unusable 2002-01-03 20:23 ` Ken Brownfield @ 2002-01-03 20:50 ` Rik van Riel 2002-01-03 21:54 ` Andrew Morton ` (2 subsequent siblings) 3 siblings, 0 replies; 49+ messages in thread From: Rik van Riel @ 2002-01-03 20:50 UTC (permalink / raw) To: Ken Brownfield; +Cc: Andreas Hartmann, Kernel-Mailingliste On Thu, 3 Jan 2002, Ken Brownfield wrote: > A) VM has major issues > 1) about a dozen recent OOPS reports in VM code > 2) VM falls down on large-memory machines with a > high inode count (slocate/updatedb, i/dcache) > 3) Memory allocation failures and OOM triggers > even though caches remain full. > 4) Other bugs fixed in -aa and others > B) Live- and dead-locks that I'm seeing on all 2.4 production > machines > 2.4.9, possibly related to A. But how will I > ever find out? I've spent ages trying to fix these bugs in the -ac kernel, but they got all backed out in search of better performance. Right now I'm developing a VM again, but I have no interest at all in fixing the livelocks in the main kernel, they'll just get removed again after a while. If you want to test my VM stuff, you can get patches from http://surriel.com/patches/ or direct access at the bitkeeper tree on http://linuxvm.bkbits.net/ cheers, Rik -- Shortwave goes a long way: irc.starchat.net #swl http://www.surriel.com/ http://distro.conectiva.com/ ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [2.4.17/18pre] VM and swap - it's really unusable 2002-01-03 20:23 ` Ken Brownfield 2002-01-03 20:50 ` Rik van Riel @ 2002-01-03 21:54 ` Andrew Morton 2002-01-04 4:56 ` Ken Brownfield 2002-01-04 0:19 ` Stephan von Krawczynski 2002-01-11 20:41 ` Ken Brownfield 3 siblings, 1 reply; 49+ messages in thread From: Andrew Morton @ 2002-01-03 21:54 UTC (permalink / raw) To: Ken Brownfield; +Cc: Andreas Hartmann, Kernel-Mailingliste Ken Brownfield wrote: > > Unfortunately, I lost the response that basically said "2.4 looks stable > to me", but let me count the ways in which I agree with Andreas' > sentiment: > > A) VM has major issues > 1) about a dozen recent OOPS reports in VM code Ben LaHaise's fix for page_cache_release() is absolutely required. > 2) VM falls down on large-memory machines with a > high inode count (slocate/updatedb, i/dcache) > 3) Memory allocation failures and OOM triggers > even though caches remain full. > 4) Other bugs fixed in -aa and others > B) Live- and dead-locks that I'm seeing on all 2.4 production > machines > 2.4.9, possibly related to A. But how will I > ever find out? Does this happen with the latest -aa patch? If so, please send a full system description and report. > C) IO-APIC code that requires noapic on any and all SMP > machines that I've ever run on. Dunno about this one. Have you prepared a description? - ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [2.4.17/18pre] VM and swap - it's really unusable 2002-01-03 21:54 ` Andrew Morton @ 2002-01-04 4:56 ` Ken Brownfield 0 siblings, 0 replies; 49+ messages in thread From: Ken Brownfield @ 2002-01-04 4:56 UTC (permalink / raw) To: Andrew Morton; +Cc: linux-kernel Actually, I posted about C) many moons ago, and had some chats with Manfred Spraul and Alan. It's a tough one to crack, and I have my own workaround patch (below) that I've been using for a while now. My posts are in the archives, but I can send a summary by request. I haven't succeeded my bag check in putting -aa in production, which is where I'm able to reproduce these problems. Part of the problem is me, in that I can't easily test with -aa. And part of the problem is chicken vs egg -- can't test unless it's in mainline, don't want to put questionable stuff in a release kernel, even a -pre... But I do think the -aa stuff is worth breaking out into Marcelo-digestable chunks as soon as Andrea can. The machines that are OOPSing are in production and right now don't have serial consoles available... that will change in a month or so, but right now I can't decode OOPSes without hand-copying. I might get that desparate unless the problem goes away with 2.4.18 (with -aa merged, hopefully. :) Thanks much, -- Ken. brownfld@irridia.com Applies to any recent 2.4. Changing indent sucks. --- linux/arch/i386/kernel/io_apic.c.orig Tue Nov 13 17:28:41 2001 +++ linux/arch/i386/kernel/io_apic.c Tue Dec 18 15:10:45 2001 @@ -172,6 +172,7 @@ int pirq_entries [MAX_PIRQS]; int pirqs_enabled; int skip_ioapic_setup; +int pintimer_setup; static int __init ioapic_setup(char *str) { @@ -179,7 +180,14 @@ return 1; } +static int __init do_pintimer_setup(char *str) +{ + pintimer_setup = 1; + return 1; +} + __setup("noapic", ioapic_setup); +__setup("pintimer", do_pintimer_setup); static int __init ioapic_pirq_setup(char *str) { @@ -1524,27 +1532,31 @@ printk(KERN_ERR "..MP-BIOS bug: 8254 timer not connected to IO-APIC\n"); } - printk(KERN_INFO "...trying to set up timer (IRQ0) through the 8259A ... "); - if (pin2 != -1) { - printk("\n..... (found pin %d) ...", pin2); - /* - * legacy devices should be connected to IO APIC #0 - */ - setup_ExtINT_IRQ0_pin(pin2, vector); - if (timer_irq_works()) { - printk("works.\n"); - if (nmi_watchdog == NMI_IO_APIC) { - setup_nmi(); - check_nmi_watchdog(); + if ( pintimer_setup ) + printk(KERN_INFO "...skipping 8259A init for IRQ0\n"); + else { + printk(KERN_INFO "...trying to set up timer (IRQ0) through the 8259A ... "); + if (pin2 != -1) { + printk("\n..... (found pin %d) ...", pin2); + /* + * legacy devices should be connected to IO APIC #0 + */ + setup_ExtINT_IRQ0_pin(pin2, vector); + if (timer_irq_works()) { + printk("works.\n"); + if (nmi_watchdog == NMI_IO_APIC) { + setup_nmi(); + check_nmi_watchdog(); + } + return; } - return; + /* + * Cleanup, just in case ... + */ + clear_IO_APIC_pin(0, pin2); } - /* - * Cleanup, just in case ... - */ - clear_IO_APIC_pin(0, pin2); + printk(" failed.\n"); } - printk(" failed.\n"); if (nmi_watchdog) { printk(KERN_WARNING "timer doesnt work through the IO-APIC - disabling NMI Watchdog!\n"); On Thu, Jan 03, 2002 at 01:54:14PM -0800, Andrew Morton wrote: | Ken Brownfield wrote: | > | > Unfortunately, I lost the response that basically said "2.4 looks stable | > to me", but let me count the ways in which I agree with Andreas' | > sentiment: | > | > A) VM has major issues | > 1) about a dozen recent OOPS reports in VM code | | Ben LaHaise's fix for page_cache_release() is absolutely required. | | > 2) VM falls down on large-memory machines with a | > high inode count (slocate/updatedb, i/dcache) | > 3) Memory allocation failures and OOM triggers | > even though caches remain full. | > 4) Other bugs fixed in -aa and others | > B) Live- and dead-locks that I'm seeing on all 2.4 production | > machines > 2.4.9, possibly related to A. But how will I | > ever find out? | | Does this happen with the latest -aa patch? If so, please send | a full system description and report. | | > C) IO-APIC code that requires noapic on any and all SMP | > machines that I've ever run on. | | Dunno about this one. Have you prepared a description? | | | - ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [2.4.17/18pre] VM and swap - it's really unusable 2002-01-03 20:23 ` Ken Brownfield 2002-01-03 20:50 ` Rik van Riel 2002-01-03 21:54 ` Andrew Morton @ 2002-01-04 0:19 ` Stephan von Krawczynski 2002-01-04 5:26 ` Ken Brownfield 2002-01-04 20:15 ` Andreas Hartmann 2002-01-11 20:41 ` Ken Brownfield 3 siblings, 2 replies; 49+ messages in thread From: Stephan von Krawczynski @ 2002-01-04 0:19 UTC (permalink / raw) To: Ken Brownfield; +Cc: Andreas Hartmann, Kernel-Mailingliste > Unfortunately, I lost the response that basically said "2.4 looks stable > to me", but let me count the ways in which I agree with Andreas' > sentiment: > > A) VM has major issues On all boxes I run currently (all 1GB or below RAM), I cannot find _major_ issues. > 2) VM falls down on large-memory machines with a > high inode count (slocate/updatedb, i/dcache) Must be beyond the GB range. > 3) Memory allocation failures and OOM triggers > even though caches remain full. I have not had one up to now in everyday life with 2.4.17 > 4) Other bugs fixed in -aa and others Hm, well I would expect Andrea to do tuning and fixing as experience evolves... > B) Live- and dead-locks that I'm seeing on all 2.4 production > machines > 2.4.9, possibly related to A. But how will I > ever find out? Me = none up to now I could track down to a kernel issue. The single one I had was with a distro kernel around 2.4.10 and flaky hardware. > C) IO-APIC code that requires noapic on any and all SMP > machines that I've ever run on. I am currently running 5 Asus CUV4X-D based SMP boxes all with apic _on_, amongst which are squids, sql servers, workstation type setups (2 my very own). > I don't have anything against anyone here -- I think everyone is doing a > fine job. It's an issue of acceptance of the problem and focus. These > issues are all showstoppers for me, and while I don't represent the 90% > of the Linux market that is UP desktops, IMHO future work on the kernel > will be degraded by basic functionality that continues to cause > problems. Have you run _yourself_ into a problem with 2.4.17? I mean it is not perfect of course, but it is far better than you make it look. I could hand the brown bag to all versions below about 2.4.15 pretty easy, but since 2.4.16 it has really become hard to shoot it down for me. Ok, I use only pretty selected hardware, but there are reasons I do, and they are not related to the kernel in first place. Regards, Stephan ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [2.4.17/18pre] VM and swap - it's really unusable 2002-01-04 0:19 ` Stephan von Krawczynski @ 2002-01-04 5:26 ` Ken Brownfield 2002-01-04 8:06 ` Ville Herva 2002-01-04 13:03 ` Stephan von Krawczynski 2002-01-04 20:15 ` Andreas Hartmann 1 sibling, 2 replies; 49+ messages in thread From: Ken Brownfield @ 2002-01-04 5:26 UTC (permalink / raw) To: Stephan von Krawczynski; +Cc: linux-kernel On Fri, Jan 04, 2002 at 01:19:28AM +0100, Stephan von Krawczynski wrote: | > A) VM has major issues | | On all boxes I run currently (all 1GB or below RAM), I cannot find | _major_ issues. Yeah, I'm seeing it primarily with 1-4GB, though I have very few <1GB machines in production. | > 2) VM falls down on large-memory machines with a | > high inode count (slocate/updatedb, i/dcache) | | Must be beyond the GB range. The critical part is the high inode count -- memory amount increases the severity rather than triggering the problem. | > 3) Memory allocation failures and OOM triggers | > even though caches remain full. | | I have not had one up to now in everyday life with 2.4.17 I'm seeing this in malloc()-heavy apps, but fairly sporadic unless I create a test case. On desktops, most of these issues disappear, but I do think the mindset behind the kernel needs to at least partially break free of the grip of UP desktops, at least to the point of fixing issues like I'm mentioning. Not critical for me; but high-profile on lkml. [...] | > C) IO-APIC code that requires noapic on any and all SMP | > machines that I've ever run on. | | I am currently running 5 Asus CUV4X-D based SMP boxes all with apic | _on_, amongst which are squids, sql servers, workstation type setups | (2 my very own). Do they have *sustained* heavy hit/IRQ/IO load? For example, sending 25Mbit and >1,000 connections/s of sustained small images traffic through khttpd will kill 2.4 (slow loss of timer and eventual total freeze) in a couple of hours. Trivially reproducable for me on SMP with any amount of memory. On HP, Tyan, Intel, Asus... etc. | Have you run _yourself_ into a problem with 2.4.17? | I mean it is not perfect of course, but it is far better than you make | it look. 2.4.17 (and -pre/-rc) is my yardstick, actually. With the exception of -aa, I stay very close to the bleeding edge. Please don't misunderstand -- I don't think any 2.4 kernel sucks (with the exception of the two or three DONTUSE kernels. :) In fact, I have zero complaints other than the ones I've listed. I was ecstatic when 2.2 came out, and 2.4 is just as impressive. It's not that the kernel is bad, it's that there are specific things that shouldn't be forgotten because of a "the kernel is good" evaluation. Especially those that make Linux regularly unstable in common production environments. | I could hand the brown bag to all versions below about 2.4.15 pretty | easy, but since 2.4.16 it has really become hard to shoot it down for | me. Ok, I use only pretty selected hardware, but there are reasons I | do, and they are not related to the kernel in first place. I use pretty selected hardware as well -- scaling hundreds of servers for varied uses really depends on having someone track and select hardware, and using it homogenously. Of course, of all of the selected hardware I've used over the last two years since 2.4.0-test1, C) has persisted on all configurations, but the others are more recent but equally omnipresent. Like I said, I suspect that most people with machines in lower-load environments don't have these issues, but "number of people effected" is only one metric to judge the importance of an issue. Of course, I'm not biased or anything. ;-) Thanks for the input, -- Ken. brownfld@irridia.com | | Regards, | Stephan | ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [2.4.17/18pre] VM and swap - it's really unusable 2002-01-04 5:26 ` Ken Brownfield @ 2002-01-04 8:06 ` Ville Herva 2002-01-04 13:05 ` Stephan von Krawczynski 2002-01-04 13:03 ` Stephan von Krawczynski 1 sibling, 1 reply; 49+ messages in thread From: Ville Herva @ 2002-01-04 8:06 UTC (permalink / raw) To: Ken Brownfield; +Cc: Stephan von Krawczynski, linux-kernel On Thu, Jan 03, 2002 at 11:26:01PM -0600, you [Ken Brownfield] claimed: > > | > 3) Memory allocation failures and OOM triggers > | > even though caches remain full. > | > | I have not had one up to now in everyday life with 2.4.17 > > I'm seeing this in malloc()-heavy apps, but fairly sporadic unless I > create a test case. I'm seeing this on 2GB IA64 (2.4.16-17). I posted a _very_ simple test case to lkml a while a go. It didn't happen on 256MB x86. I plan to try -aa shortly, now that I got patches to make it compile on IA64. -- v -- v@iki.fi ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [2.4.17/18pre] VM and swap - it's really unusable 2002-01-04 8:06 ` Ville Herva @ 2002-01-04 13:05 ` Stephan von Krawczynski 0 siblings, 0 replies; 49+ messages in thread From: Stephan von Krawczynski @ 2002-01-04 13:05 UTC (permalink / raw) To: Ville Herva; +Cc: brownfld, linux-kernel On Fri, 4 Jan 2002 10:06:05 +0200 Ville Herva <vherva@niksula.hut.fi> wrote: > On Thu, Jan 03, 2002 at 11:26:01PM -0600, you [Ken Brownfield] claimed: > > > > | > 3) Memory allocation failures and OOM triggers > > | > even though caches remain full. > > | > > | I have not had one up to now in everyday life with 2.4.17 > > > > I'm seeing this in malloc()-heavy apps, but fairly sporadic unless I > > create a test case. > > I'm seeing this on 2GB IA64 (2.4.16-17). I posted a _very_ simple test case > to lkml a while a go. It didn't happen on 256MB x86. > > I plan to try -aa shortly, now that I got patches to make it compile on > IA64. Ok, I am going to buy more mem right now to see what you see. Regards, Stephan ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [2.4.17/18pre] VM and swap - it's really unusable 2002-01-04 5:26 ` Ken Brownfield 2002-01-04 8:06 ` Ville Herva @ 2002-01-04 13:03 ` Stephan von Krawczynski 2002-01-04 23:50 ` Ken Brownfield 1 sibling, 1 reply; 49+ messages in thread From: Stephan von Krawczynski @ 2002-01-04 13:03 UTC (permalink / raw) To: Ken Brownfield; +Cc: linux-kernel On Thu, 3 Jan 2002 23:26:01 -0600 Ken Brownfield <brownfld@irridia.com> wrote: > On Fri, Jan 04, 2002 at 01:19:28AM +0100, Stephan von Krawczynski wrote: > | > A) VM has major issues > | > | On all boxes I run currently (all 1GB or below RAM), I cannot find > | _major_ issues. > > Yeah, I'm seeing it primarily with 1-4GB, though I have very few <1GB > machines in production. Ok. It would be really nice to know if the -aa patches do any good at your configs. Andrea has possibly done something on the issue. But let me take this chance to state an open word: last time Andrea talked about his personal hardware I couldn't really believe it - because it was so ridiculously small. I wonder if anyone at SuSE management _does_ actually read this list and think about how someone can do a good job without good equipment. If you really want to do something groundbreaking about highmem you have to have a _box_. A box _somewhere_ in the world or a patch for highmem-in-lowmem is not really the same thing. Even Schumacher wouldn't have won formula one by sitting inside a Fiat Uno with a patched speedometer. > but I > do think the mindset behind the kernel needs to at least partially break > free of the grip of UP desktops, at least to the point of fixing issues > like I'm mentioning. > > Not critical for me; but high-profile on lkml. You are right. > [...] > | > C) IO-APIC code that requires noapic on any and all SMP > | > machines that I've ever run on. > | > | I am currently running 5 Asus CUV4X-D based SMP boxes all with apic > | _on_, amongst which are squids, sql servers, workstation type setups > | (2 my very own). > > Do they have *sustained* heavy hit/IRQ/IO load? For example, sending > 25Mbit and >1,000 connections/s of sustained small images traffic > through khttpd will kill 2.4 (slow loss of timer and eventual total > freeze) in a couple of hours. Trivially reproducable for me on SMP with > any amount of memory. On HP, Tyan, Intel, Asus... etc. Hm, I have about 24GB of NFS traffic every day, which may be too less. What exactly are you seeing in this case (logfiles etc.)? > It's not that the kernel is bad, it's that there are specific things > that shouldn't be forgotten because of a "the kernel is good" > evaluation. Hopefully nobody does this here, I don't. > Like I said, I suspect that most people with machines in lower-load > environments don't have these issues, but "number of people effected" is > only one metric to judge the importance of an issue. The number of people is not really interesting for me, as the boxes get bigger every day it is only a matter of time to see more people with lots of GB (as an example). > Of course, I'm not biased or anything. ;-) How could you ? ;-)) Regards, Stephan ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [2.4.17/18pre] VM and swap - it's really unusable 2002-01-04 13:03 ` Stephan von Krawczynski @ 2002-01-04 23:50 ` Ken Brownfield 2002-01-05 15:08 ` Stephan von Krawczynski 0 siblings, 1 reply; 49+ messages in thread From: Ken Brownfield @ 2002-01-04 23:50 UTC (permalink / raw) To: Stephan von Krawczynski; +Cc: linux-kernel On Fri, Jan 04, 2002 at 02:03:21PM +0100, Stephan von Krawczynski wrote: [...] | Ok. It would be really nice to know if the -aa patches do any good at your I'd love to, but unfortunately my problems reproduce only in production, and -- nothing against Andrea -- I'm hesitant to deploy -aa live, since it hasn't received the widespread use that mainline has. I may be forced to soon if the VM fixes don't get merged. [...] | > Do they have *sustained* heavy hit/IRQ/IO load? For example, sending | > 25Mbit and >1,000 connections/s of sustained small images traffic | > through khttpd will kill 2.4 (slow loss of timer and eventual total | > freeze) in a couple of hours. Trivially reproducable for me on SMP with | > any amount of memory. On HP, Tyan, Intel, Asus... etc. | | Hm, I have about 24GB of NFS traffic every day, which may be too less. What | exactly are you seeing in this case (logfiles etc.)? Well, the nature of the problem is that the timer "slows" and stops, causing the machine to get more and more sluggish until it falls of the net and stops dead. I suspect that high IRQ rates cause the issue -- large sequential transfers are not necessarily culprits due the lowish overhead. [...] | > It's not that the kernel is bad, it's that there are specific things | > that shouldn't be forgotten because of a "the kernel is good" | > evaluation. | | Hopefully nobody does this here, I don't. I don't think it's intentional, and I realize that VM changes are hard to swallow in a stable kernel release. I just hope that the severity and fairly wide negative effect is enough to make people more comfortable with accepting VM fixes that may be somewhat invasive. Thanks, -- Ken. brownfld@irridia.com ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [2.4.17/18pre] VM and swap - it's really unusable 2002-01-04 23:50 ` Ken Brownfield @ 2002-01-05 15:08 ` Stephan von Krawczynski 2002-01-05 21:40 ` Ken Brownfield ` (2 more replies) 0 siblings, 3 replies; 49+ messages in thread From: Stephan von Krawczynski @ 2002-01-05 15:08 UTC (permalink / raw) To: Ken Brownfield; +Cc: linux-kernel On Fri, 4 Jan 2002 17:50:50 -0600 Ken Brownfield <brownfld@irridia.com> wrote: > On Fri, Jan 04, 2002 at 02:03:21PM +0100, Stephan von Krawczynski wrote: > [...] > | Ok. It would be really nice to know if the -aa patches do any good at your > > I'd love to, but unfortunately my problems reproduce only in production, > and -- nothing against Andrea -- I'm hesitant to deploy -aa live, since > it hasn't received the widespread use that mainline has. I may be > forced to soon if the VM fixes don't get merged. I am pretty impressed by Martins test case where merely all VM patches fail with the exception of his own :-) The thing is, this test is not of nature "very special" but more like "system driven to limit by normal processes". And this is the real interesting part about it. > | Hm, I have about 24GB of NFS traffic every day, which may be too less. What > | exactly are you seeing in this case (logfiles etc.)? > > Well, the nature of the problem is that the timer "slows" and stops, > causing the machine to get more and more sluggish until it falls of the > net and stops dead. > > I suspect that high IRQ rates cause the issue -- large sequential > transfers are not necessarily culprits due the lowish overhead. What exactly do you mean with "high IRQ rate"? Can you show so numbers from /proc/interrupts and uptime for clarification? > | Hopefully nobody does this here, I don't. > > I don't think it's intentional, and I realize that VM changes are hard > to swallow in a stable kernel release. I just hope that the severity > and fairly wide negative effect is enough to make people more > comfortable with accepting VM fixes that may be somewhat invasive. Hm, I don't think real "big" patches are needed, Rik is according to Martins test no gain currently as rmap flops in this test, too. The problem is: you should really use one of your problem machines for at least very simple testing. If you don't you possibly cannot expect your problem to be solved soon. We would need input from your side. If I were you, I'd start of with Martins patch. It is simple (very simple indeed), small and pinned to a single procedure. Martins test shows - under "normal" high load (not especially IRQ) - good result and no difference in standard load, I cannot see a risk for oops or deadlock. Regards, Stephan ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [2.4.17/18pre] VM and swap - it's really unusable 2002-01-05 15:08 ` Stephan von Krawczynski @ 2002-01-05 21:40 ` Ken Brownfield 2002-01-06 15:48 ` Stephan von Krawczynski 2002-01-07 1:42 ` Rik van Riel 2002-01-08 15:19 ` Update " Ken Brownfield 2 siblings, 1 reply; 49+ messages in thread From: Ken Brownfield @ 2002-01-05 21:40 UTC (permalink / raw) To: Stephan von Krawczynski; +Cc: linux-kernel On Sat, Jan 05, 2002 at 04:08:33PM +0100, Stephan von Krawczynski wrote: | I am pretty impressed by Martins test case where merely all VM patches fail | with the exception of his own :-) The thing is, this test is not of nature | "very special" but more like "system driven to limit by normal processes". And | this is the real interesting part about it. One problem is that I've never heard of it and don't know where to get it. ;) | What exactly do you mean with "high IRQ rate"? Can you show so numbers from | /proc/interrupts and uptime for clarification? I did, back in the archives. I don't have easy access to archives etc, right now, but I might repost since it's been a while. | The problem is: you should really use one of your problem machines for at least | very simple testing. If you don't you possibly cannot expect your problem to be | solved soon. We would need input from your side. If I were you, I'd start of | with Martins patch. It is simple (very simple indeed), small and pinned to a | single procedure. Martins test shows - under "normal" high load (not especially | IRQ) - good result and no difference in standard load, I cannot see a risk for | oops or deadlock. Well, reboots are the problem over possible oopses (or data corruption, even more fun.) But on your recommendation I'll give Martin's mod a try, given a URL. Does Martin's patch play well with -aa? How about Martin+10_vm in -pre2? ;-) At any rate, right now there are three or four people with different VM patch sets, probably more. There is a certain amount of work this group can do in judging which concepts are cleaner or most suitable to 2.4.x. It would be cool to give rmap a try, but I don't want to maintain a 2.4.x kernel with speculative features that aren't intented for 2.4.x. I can see using patches back-ported from 2.5, but I'm a firm believer that 2.4 should stay stable and that the benefit of 2.4 to admins is the control by the maintainer and stability -- not the VM of the month. I can test, but it's slow going with so many patches. And many of the patches haven't been properly merged with any kernel (e.g., -aa 10_vm reverting previously applied 2.4 changes, etc.) While I've reproduced the issues and explained them here in the past, it's difficult for me to iterate fast enough in an environment that easily reproduces tha problem. I'm iterating as fast as I can, but when I do iterate I'd prefer some support from the maintainers or other parts of the community that "Yes, this patch has a good chance of fixing the specific problems we've been seeing, give it a try." Right now that doesn't exist (with the exception of your recommendation of this Martin patch), and that's one reason I'm hesitant to iterate too much and effect a lot of people. Thanks, -- Ken. brownfld@irridia.com | | Regards, | Stephan | ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [2.4.17/18pre] VM and swap - it's really unusable 2002-01-05 21:40 ` Ken Brownfield @ 2002-01-06 15:48 ` Stephan von Krawczynski 2002-01-08 5:09 ` Ken Brownfield 0 siblings, 1 reply; 49+ messages in thread From: Stephan von Krawczynski @ 2002-01-06 15:48 UTC (permalink / raw) To: Ken Brownfield; +Cc: linux-kernel On Sat, 5 Jan 2002 15:40:53 -0600 Ken Brownfield <brownfld@irridia.com> wrote: > One problem is that I've never heard of it and don't know where to get > it. ;) [Sent in off-LKML mail] > | What exactly do you mean with "high IRQ rate"? Can you show so numbers from > | /proc/interrupts and uptime for clarification? > > I did, back in the archives. I don't have easy access to archives etc, > right now, but I might repost since it's been a while. I read all your LKML mails since beginning of November, could find a lot about cpu, configs,tops etc but not a single "cat /proc/interrupts" together with uptime. > Well, reboots are the problem over possible oopses (or data corruption, > even more fun.) But on your recommendation I'll give Martin's mod a > try, given a URL. Does Martin's patch play well with -aa? How about > Martin+10_vm in -pre2? ;-) According to the ongoings of your mails you seem to try really a lot of things to make it work out. I recommend not to intermix the patches a lot. I would stay close to marcelo's tree and try _single_ small patches on top of that. If you mix them up (even only two of them) you won't be able to track down very well, what is really better or worse. One thing I would like to ask here is this (as you are dealing with oracle stuff): why does oracle recommend to compile the kernel in 486 mode? I talked to someone who uses oracle on 2.4.x and he told me it is even in the latest docs. What is the voodoo behind that? Btw he has no freezes or the like, but occasional coredumps from oracle processes, which he states as "not nice, but no showstopper" as his clients reconnect/retransmit with only a slight delay. This may be related to VM, thats why I will try to convince him of some patches :-) and have a look at the coredump-frequency. Regards, Stephan ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [2.4.17/18pre] VM and swap - it's really unusable 2002-01-06 15:48 ` Stephan von Krawczynski @ 2002-01-08 5:09 ` Ken Brownfield 0 siblings, 0 replies; 49+ messages in thread From: Ken Brownfield @ 2002-01-08 5:09 UTC (permalink / raw) To: Stephan von Krawczynski; +Cc: linux-kernel On Sun, Jan 06, 2002 at 04:48:13PM +0100, Stephan von Krawczynski wrote: [...] | I read all your LKML mails since beginning of November, could find a lot about | cpu, configs,tops etc but not a single "cat /proc/interrupts" together with | uptime. http://web.irridia.com/info/linux/APIC/ This was published back in the beginning (4/2001), and additional stuff sent to Alan and Manfred for debugging. I was pushing my problem on LKML for a couple of weeks, but without much feedback I'm sticking to my workaround. This also feeds back to my earlier thoughts on some kind of LKML summary page of patches and problem reports for those disinclined to wade through the high LKML traffic. It's hard for me, much less you, to go back through the archives manually... | According to the ongoings of your mails you seem to try really a lot of things | to make it work out. I recommend not to intermix the patches a lot. I would | stay close to marcelo's tree and try _single_ small patches on top of that. If | you mix them up (even only two of them) you won't be able to track down very | well, what is really better or worse. Actually, that's why I don't test -aa. Whatever Marcelo chooses to include, I'll trust it in its entirety. But I've tested, for example, Linus' locked memory patch, and a couple of Andrew's isolated patches, all applied to mainline with nothing else. I can't try -aa because it has interdependencies and unintentional (I assume) backouts of code. | One thing I would like to ask here is this (as you are dealing with oracle | stuff): why does oracle recommend to compile the kernel in 486 mode? I talked | to someone who uses oracle on 2.4.x and he told me it is even in the latest | docs. What is the voodoo behind that? Btw he has no freezes or the like, but | occasional coredumps from oracle processes, which he states as "not nice, but | no showstopper" as his clients reconnect/retransmit with only a slight delay. | This may be related to VM, thats why I will try to convince him of some patches | :-) and have a look at the coredump-frequency. I haven't had any problems with Oracle at all since Linus' locked memory patch back in the 2.4.14-15ish days. This on a 4GB 6-way Xeon with ext2, reiser, couple of other complications, with the kernel compiled for P3. I really don't know what would cause Oracle to misbehave with an i686 kernel that wouldn't be a kernel bug. Perhaps a gcc-related bug? I'm still using 2.91.66 for kernels, although I've used 2.95.x with no problems. I'm not touching 2.96.x with a ten-foot pole, waiting instead for a sane 3.x one of these years. I think Oracle (the company) is a little short of tooth on Linux experience, since for example and AFAIK they never discovered the fatal 2.4 locked memory problem -- that took Google's report and to a much lesser extent my later discovery of the same problem. -- Ken. brownfld@irridia.com ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [2.4.17/18pre] VM and swap - it's really unusable 2002-01-05 15:08 ` Stephan von Krawczynski 2002-01-05 21:40 ` Ken Brownfield @ 2002-01-07 1:42 ` Rik van Riel 2002-01-07 2:22 ` Rik van Riel 2002-01-08 15:19 ` Update " Ken Brownfield 2 siblings, 1 reply; 49+ messages in thread From: Rik van Riel @ 2002-01-07 1:42 UTC (permalink / raw) To: Stephan von Krawczynski; +Cc: Ken Brownfield, linux-kernel On Sat, 5 Jan 2002, Stephan von Krawczynski wrote: > I am pretty impressed by Martins test case where merely all VM patches > fail with the exception of his own :-) No big wonder if both -aa and -rmap only get tested without swap ;) Rik -- Shortwave goes a long way: irc.starchat.net #swl http://www.surriel.com/ http://distro.conectiva.com/ ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [2.4.17/18pre] VM and swap - it's really unusable 2002-01-07 1:42 ` Rik van Riel @ 2002-01-07 2:22 ` Rik van Riel 2002-01-07 14:20 ` Stephan von Krawczynski 0 siblings, 1 reply; 49+ messages in thread From: Rik van Riel @ 2002-01-07 2:22 UTC (permalink / raw) To: Stephan von Krawczynski; +Cc: Ken Brownfield, linux-kernel On Sun, 6 Jan 2002, Rik van Riel wrote: > On Sat, 5 Jan 2002, Stephan von Krawczynski wrote: > > > I am pretty impressed by Martins test case where merely all VM patches > > fail with the exception of his own :-) > > No big wonder if both -aa and -rmap only get tested without swap ;) To be clear ... -aa and -rmap should of course also work nicely without swap, no excuses for the bad behaviour shown in Martin's test, but at the moment they simply don't seem tuned for it. regards, Rik -- Shortwave goes a long way: irc.starchat.net #swl http://www.surriel.com/ http://distro.conectiva.com/ ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [2.4.17/18pre] VM and swap - it's really unusable 2002-01-07 2:22 ` Rik van Riel @ 2002-01-07 14:20 ` Stephan von Krawczynski 2002-01-08 0:36 ` Rik van Riel 0 siblings, 1 reply; 49+ messages in thread From: Stephan von Krawczynski @ 2002-01-07 14:20 UTC (permalink / raw) To: Rik van Riel; +Cc: brownfld, linux-kernel On Mon, 7 Jan 2002 00:22:09 -0200 (BRST) Rik van Riel <riel@conectiva.com.br> wrote: > On Sun, 6 Jan 2002, Rik van Riel wrote: > > On Sat, 5 Jan 2002, Stephan von Krawczynski wrote: > > > > > I am pretty impressed by Martins test case where merely all VM patches > > > fail with the exception of his own :-) > > > > No big wonder if both -aa and -rmap only get tested without swap ;) > > To be clear ... -aa and -rmap should of course also work > nicely without swap, no excuses for the bad behaviour > shown in Martin's test, but at the moment they simply > don't seem tuned for it. Good to hear we agree it _should_ work. When does it (rmap)? ;-) Regards, Stephan ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [2.4.17/18pre] VM and swap - it's really unusable 2002-01-07 14:20 ` Stephan von Krawczynski @ 2002-01-08 0:36 ` Rik van Riel 0 siblings, 0 replies; 49+ messages in thread From: Rik van Riel @ 2002-01-08 0:36 UTC (permalink / raw) To: Stephan von Krawczynski; +Cc: brownfld, linux-kernel On Mon, 7 Jan 2002, Stephan von Krawczynski wrote: > > To be clear ... -aa and -rmap should of course also work > > nicely without swap, no excuses for the bad behaviour > > shown in Martin's test, but at the moment they simply > > don't seem tuned for it. > > Good to hear we agree it _should_ work. When does it (rmap)? > ;-) I integrated Ed Tomlinson's patch today and have made one more small change. In the patches I ran here things worked fine, the system avoids OOM now. Problem is, it doesn't seem to want to run the OOM killer when needed, at least not any time soon. I need to check out this code again later. Anyway, rmap-11 should work fine for your test. ;) regards, Rik -- Shortwave goes a long way: irc.starchat.net #swl http://www.surriel.com/ http://distro.conectiva.com/ ^ permalink raw reply [flat|nested] 49+ messages in thread
* Update Re: [2.4.17/18pre] VM and swap - it's really unusable 2002-01-05 15:08 ` Stephan von Krawczynski 2002-01-05 21:40 ` Ken Brownfield 2002-01-07 1:42 ` Rik van Riel @ 2002-01-08 15:19 ` Ken Brownfield 2 siblings, 0 replies; 49+ messages in thread From: Ken Brownfield @ 2002-01-08 15:19 UTC (permalink / raw) To: Stephan von Krawczynski, M.H.VanLeeuwen, akpm; +Cc: linux-kernel I stayed at work all night banging out tests on a few of our machines here. I took 2.4.18-pre2 and 2.4.18-pre2 with the vmscan patch from "M.H.VanLeeuwen" <vanl@megsinet.net>. My sustained test consisted of this type of load: ls -lR / > /dev/null & /usr/bin/slocate -u -f "nfs,smbfs,ncpfs,proc,devpts" -e "/tmp,/var/tmp,/usr/tmp,/afs,/net" & dd if=/dev/sda3 of=/sda3 bs=1024k & # Hit TUX on this machine repeatedly; html page with 1000 images # Wait for memory to be mostly used by buff/page cache ./a.out & # repeat finished commands -- keep all commands running # after a.out finishes, alow buff/page to refill before repeating The a.out in this case is a little program (attached, c.c) to allocate and write to an amount of memory equal to physical RAM. The example I chose below is from a 2xP3/600 with 1GB of RAM and 2GB swap. This was not a formal benchmark -- I think benchmarks have been presented before by other folks, and looking at benchmarks does not necessarily indicate the real-world problems that exist. My intent was to reproduce the issues I've been seeing, and then apply the MH (and only the MH) patch and observe. 2.4.18-pre2 Once slocate starts and gets close to filling RAM with buffer/page cache, kupdated and kswapd have periodic spikes of 50-100% CPU. When a.out starts, kswapd and kupdated begin to eat significant portions of CPU (20-100%) and I/O becomes more and more sluggish as a.out allocates. When a.out uses all free RAM and should begin eating cache, significant swapping begins and cache is not decreased significantly until the machine goes 100-200MB into swap. Here are two readprofile outputs, sorted by ticks and load. 229689 default_idle 4417.0962 4794 file_read_actor 18.4385 405 __rdtsc_delay 14.4643 3763 do_anonymous_page 14.0410 3796 statm_pgd_range 9.7835 1535 prune_icache 6.9773 153 __free_pages 4.7812 1420 create_bounce 4.1765 583 sym53c8xx_intr 3.9392 221 atomic_dec_and_lock 2.7625 5214 generic_file_write 2.5659 273464 total 0.1903 234168 default_idle 4503.2308 5298 generic_file_write 2.6073 4868 file_read_actor 18.7231 3799 statm_pgd_range 9.7912 3763 do_anonymous_page 14.0410 1535 prune_icache 6.9773 1526 shrink_cache 1.6234 1469 create_bounce 4.3206 643 rmqueue 1.1320 591 sym53c8xx_intr 3.9932 505 __make_request 0.2902 2.4.18-pre2 with MH With the MH patch applied, the issues I witnessed above did not seem to reproduce. Memory allocation under pressure seemed faster and smoother. kswapd never went above 5-15% CPU. When a.out allocated memory, it did not begin swapping until buffer/page cache had been nearly completely cannibalized. And when a.out caused swapping, it was controlled and behaved like you would expect the VM to bahave -- slowly swapping out unused pages instead of large swap write-outs without the patch. Martin, have you done throughput benchmarks with MH/rmap/aa, BTW? But both kernels still seem to be sluggish when it comes to doing small I/O operations (vi, ls, etc) during heavy swapping activity. Here are the readprofile results: 206243 default_idle 3966.2115 6486 file_read_actor 24.9462 409 __rdtsc_delay 14.6071 2798 do_anonymous_page 10.4403 185 __free_pages 5.7812 1846 statm_pgd_range 4.7577 469 sym53c8xx_intr 3.1689 176 atomic_dec_and_lock 2.2000 349 end_buffer_io_async 1.9830 492 refill_inactive 1.8358 94 system_call 1.8077 245776 total 0.1710 216238 default_idle 4158.4231 6486 file_read_actor 24.9462 2799 do_anonymous_page 10.4440 1855 statm_pgd_range 4.7809 1611 generic_file_write 0.7928 839 __make_request 0.4822 820 shrink_cache 0.7374 540 rmqueue 0.9507 534 create_bounce 1.5706 492 refill_inactive 1.8358 487 sym53c8xx_intr 3.2905 There may be significant differences in the profile outputs for those with VM fu. Summary: MH swaps _after_ cache has been properly cannibalized, and swapping activity starts when expected and is properly throttled. kswapd and kupdated don't seem to go into berserk 100% CPU mode. At any rate, I now have the MH patch (and Andrew Morton's mini-ll and read-latency2 patches) in production, and I like what I see so far. I'd vote for them to go into 2.4.18, IMHO. Maybe the full low-latency patch if it's not truly 2.5 material. My next cook-off will be with -aa and rmap, although if the rather small MH patch fixes my last issues it may be worth putting all VM effort into a 2.5 VM cook-off. :) Hopefully the useful stuff in -aa can get pulled in at some point soon, though. Thanks much to Martin H. VanLeeuwen for his patch and Stephan von Krawczynski for his recommendations. I'll let MH cook for a while and I'll follow up later. -- Ken. brownfld@irridia.com c.c: #include <stdio.h> #define MB_OF_RAM 1024 int main() { long stuffsize = MB_OF_RAM * 1048576 ; char *stuff ; if ( stuff = (char *)malloc( stuffsize ) ) { long chunksize = 1048576 ; long c ; for ( c=0 ; c<chunksize ; c++ ) *(stuff+c) = '\0' ; /* hack; last chunk discarded if stuffsize%chunksize != 0 */ for ( ; (c+chunksize)<stuffsize ; c+=chunksize ) memcpy( stuff+c, stuff, chunksize ); sleep( 120 ); } else printf("OOPS\n"); exit( 0 ); } On Sat, Jan 05, 2002 at 04:08:33PM +0100, Stephan von Krawczynski wrote: [...] | I am pretty impressed by Martins test case where merely all VM patches fail | with the exception of his own :-) The thing is, this test is not of nature | "very special" but more like "system driven to limit by normal processes". And | this is the real interesting part about it. [...] ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [2.4.17/18pre] VM and swap - it's really unusable 2002-01-04 0:19 ` Stephan von Krawczynski 2002-01-04 5:26 ` Ken Brownfield @ 2002-01-04 20:15 ` Andreas Hartmann 2002-01-04 20:55 ` Stephan von Krawczynski 2002-01-05 9:24 ` Petro 1 sibling, 2 replies; 49+ messages in thread From: Andreas Hartmann @ 2002-01-04 20:15 UTC (permalink / raw) To: Stephan von Krawczynski; +Cc: Ken Brownfield, Kernel-Mailingliste Stephan von Krawczynski wrote: >>Unfortunately, I lost the response that basically said "2.4 looks >>stable >>to me", but let me count the ways in which I agree with Andreas' >>sentiment: >> >>A) VM has major issues Unfortunately you are right. >> > > On all boxes I run currently (all 1GB or below RAM), I cannot find > _major_ issues. Question is: which nature is your application / load of the system? You wrote something about database server. How much rows alltogether? What's the size of the table(s)? How many concurrent accesses do you have? Do you do "easy" searches where all of the conditions are located in the index? How big is your index? How big is the throughput of your database? Do you have your tables on raw partitions (without caching; as you can do it with UDB)? You mentioned squid, too. I'm running squid here on a AMD K6 2 400, 256 MB RAM. It's mostly (sometimes plus my wife) for my own. No more users. In this situation, I can't see any problem, too. Why? There is no load, no throughput, ... . How big are the partitions you are mounting at once? In my case, all the partitions together have about 70GB (all reiserfs). I want to know it, because I think the problem depends on how much different HD-memory is accessed. If you have applications, which doesn't access to much memory, you can't view the problems. If you access more than 1G (and you do not just copy, but rsync e.g.) and you have only 512MB of RAM, the machine swaps a lot with most actual 2.4.-kernels (patches). Another question: Are there any tools to meassure the datathroughput a application causes? Interesting would be the sum at the end of the process, the maximum and average throughput (in- and output seperated) and the same for swapactivity. It could probably help to find optimization potential. At least it would give the chance to directly compare the demand of different applications. Regards, Andreas Hartmann ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [2.4.17/18pre] VM and swap - it's really unusable 2002-01-04 20:15 ` Andreas Hartmann @ 2002-01-04 20:55 ` Stephan von Krawczynski 2002-01-05 8:39 ` Andreas Hartmann 2002-01-05 9:24 ` Petro 1 sibling, 1 reply; 49+ messages in thread From: Stephan von Krawczynski @ 2002-01-04 20:55 UTC (permalink / raw) To: Andreas Hartmann; +Cc: brownfld, linux-kernel On Fri, 04 Jan 2002 21:15:42 +0100 Andreas Hartmann <andihartmann@freenet.de> wrote: [I will answer not all of your questions, as this is a matter of business, too] > > On all boxes I run currently (all 1GB or below RAM), I cannot find > > _major_ issues. > > > Question is: which nature is your application / load of the system? Generally we do not drive the boxes up to the edge. Our philosophy is to throw money at the problem, before it actually arises. Yes, I can see the future ... ;-) > [...] Do you have your tables on raw partitions (without caching; as > you can do it with UDB)? No. > How big are the partitions you are mounting at once? In my case, all the > partitions together have about 70GB (all reiserfs). about 130 GB, all reiserfs. > I want to know it, because I think the problem depends on how much > different HD-memory is accessed. I guess you should tilt that theory. Have you already tried to throw a big SPARC at the problem? > If you have applications, which doesn't > access to much memory, you can't view the problems. > If you access more than 1G (and you do not just copy, but rsync e.g.) > and you have only 512MB of RAM, the machine swaps a lot with most actual > 2.4.-kernels (patches). Can you provide a simple and reproducible test case (e.g. some demo source), where things break? I am very willing to test it here. Regards, Stephan ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [2.4.17/18pre] VM and swap - it's really unusable 2002-01-04 20:55 ` Stephan von Krawczynski @ 2002-01-05 8:39 ` Andreas Hartmann 2002-01-05 12:59 ` M. Edward (Ed) Borasky 0 siblings, 1 reply; 49+ messages in thread From: Andreas Hartmann @ 2002-01-05 8:39 UTC (permalink / raw) To: Stephan von Krawczynski; +Cc: brownfld, linux-kernel Stephan von Krawczynski wrote: [...] >>If you have applications, which doesn't >>access to much memory, you can't view the problems. >>If you access more than 1G (and you do not just copy, but rsync e.g.) >>and you have only 512MB of RAM, the machine swaps a lot with most actual >>2.4.-kernels (patches). >> > > Can you provide a simple and reproducible test case (e.g. some demo source), > where things break? I am very willing to test it here. > It's easy - take a grown inn-newsserver-partition with reiserfs (*) (a lot of small files and a lot of directories), about 1,3 GB or more, and do a complete rsync to this partition to transport it somewhere else. But you have to do it with a existing target, no empty target, so that rsync must scan the whole target partition, too. I don't like special test-programs. They seldom show up the reality. What we need is a kernel that behaves fine in reality - not in testcases. And before starting the test, take care, that most of ram is already used for cache or buffers or applications. I did this test with several VM-patches and there are huge differences in swap consumption between them: 319MB with 2.4.17rc2 and 59MB with 2.4.17 oom-patch (max). It's more than a little difference :-). Regards, Andreas Hartmann (*) If I had DSL, I would send it to you (as tar.gz) - but with modem, it's a bit too much :-)! But your squid cache should be fine, too. It has a similar structure: a lot of small files and a lot of subdirectories. But I think, that your squid cache size isn't as high as my inn-partition. ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [2.4.17/18pre] VM and swap - it's really unusable 2002-01-05 8:39 ` Andreas Hartmann @ 2002-01-05 12:59 ` M. Edward (Ed) Borasky 2002-01-05 15:09 ` Andreas Hartmann 2002-01-06 15:51 ` vda 0 siblings, 2 replies; 49+ messages in thread From: M. Edward (Ed) Borasky @ 2002-01-05 12:59 UTC (permalink / raw) To: Andreas Hartmann; +Cc: Stephan von Krawczynski, brownfld, linux-kernel On Sat, 5 Jan 2002, Andreas Hartmann wrote: > I don't like special test-programs. They seldom show up the reality. > What we need is a kernel that behaves fine in reality - not in > testcases. And before starting the test, take care, that most of ram > is already used for cache or buffers or applications. OK, here's some pseduo-code for a real-world test case. I haven't had a chance to code it up, but I'm guessing I know what it's going to do. I'd *love* to be proved wrong :). # build and boot a kernel with "Magic SysRq" turned on # echo 1 > /proc/sys/kernel/sysrq # fire up "nice --19 top" as "root" # read "MemTotal" from /proc/meminfo # now start the next two jobs concurrently # write a disk file with "MemTotal" data or more in it # perform a 2D in-place FFT of total size at least "MemTotal/2" but less # than "MemTotal" Watch the "top" window like a hawk. "Cached" will grow because of the disk write and "free" will drop because the page cache is growing and the 2D FFT is using *its* memory. Eventually the two will start competing for the last bits of free memory. "kswapd" and "kupdated" will start working furiously, bringing the system CPU utilization to 99+ percent. At this point the system will appear highly unresponsive. Even with the "nice --19" setting, "top" is going to have a hard time keeping its five-second screen updates going. You will quite possibly end up going to the console and doing alt-sysrq-m, which dumps the memory status on the console and into /var/log/messages. Then if you do alt-sysrq-i, which kills everything but "init", you should be able to log on again. I'm going to try this on my 512 MB machine just to see what happens, but I'd like to see what someone with a larger machine, say 4 GB, gets when they do this. I think attempting to write a large file and do a 2D FFT concurrently is a perfectly reasonable thing to expect an image processing system to do in the real world. A "traditional" UNIX would do the I/O of the file write and the compute/memory processing of the FFT together with little or no problem. But because the 2.4 kernel insists on keeping all those buffers around, the 2D FFT is going to have difficulty, because it has to have its data in core. What's worse is if the page cache gets so big that the FFT has to start swapping. For those who aren't familiar with 2D FFTs, they take two passes over the data. The first pass will be unit strides -- sequential addresses. But the second pass will be large strides -- a power of two. That second pass is going to be brutal if every page it hits has to be swapped in! The solution is to limit page cache size to, say, 1/4 of "MemTotal", which I'm guessing will have a *negligible* impact on the performance of the file write. I used to work in an image processing lab, which is where I learned this little trick for bringing a VM to its knees, and which is probably where the designers of other UNIX systems learned that the memory used for buffering I/O needs to be limited :). There's probably a VAX or two out there still that shudders when it remembers what I did to it. :)) -- M. Edward Borasky znmeb@borasky-research.net http://www.borasky-research.net ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [2.4.17/18pre] VM and swap - it's really unusable 2002-01-05 12:59 ` M. Edward (Ed) Borasky @ 2002-01-05 15:09 ` Andreas Hartmann 2002-01-05 17:51 ` M. Edward (Ed) Borasky 2002-01-06 15:51 ` vda 1 sibling, 1 reply; 49+ messages in thread From: Andreas Hartmann @ 2002-01-05 15:09 UTC (permalink / raw) To: M. Edward (Ed) Borasky; +Cc: Stephan von Krawczynski, brownfld, linux-kernel M. Edward (Ed) Borasky wrote: > On Sat, 5 Jan 2002, Andreas Hartmann wrote: > > >>I don't like special test-programs. They seldom show up the reality. >>What we need is a kernel that behaves fine in reality - not in >>testcases. And before starting the test, take care, that most of ram >>is already used for cache or buffers or applications. >> > > OK, here's some pseduo-code for a real-world test case. I haven't had a > chance to code it up, but I'm guessing I know what it's going to do. I'd > *love* to be proved wrong :). I would like to try it with the oom-patch, which needed less swap in my tests. It could be a good test to verify the results of the rsync-test. > # build and boot a kernel with "Magic SysRq" turned on > # echo 1 > /proc/sys/kernel/sysrq > # fire up "nice --19 top" as "root" > # read "MemTotal" from /proc/meminfo > > # now start the next two jobs concurrently > > # write a disk file with "MemTotal" data or more in it > > # perform a 2D in-place FFT of total size at least "MemTotal/2" but less > # than "MemTotal" > > Watch the "top" window like a hawk. "Cached" will grow because of the > disk write and "free" will drop because the page cache is growing and > the 2D FFT is using *its* memory. Could you please tell me a programm, that does 2D FFT? I would like to do this test, too! Regards, Andreas Hartmann ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [2.4.17/18pre] VM and swap - it's really unusable 2002-01-05 15:09 ` Andreas Hartmann @ 2002-01-05 17:51 ` M. Edward (Ed) Borasky 0 siblings, 0 replies; 49+ messages in thread From: M. Edward (Ed) Borasky @ 2002-01-05 17:51 UTC (permalink / raw) To: Andreas Hartmann; +Cc: Stephan von Krawczynski, brownfld, linux-kernel On Sat, 5 Jan 2002, Andreas Hartmann wrote: > Could you please tell me a programm, that does 2D FFT? I would like to > do this test, too! Try http://www.fftw.org. This is a public domain (GPL I think) general purpose FFT library. If I get a chance I'll download it this weekend and figure out how to code a 2D FFT. -- M. Edward Borasky znmeb@borasky-research.net http://www.borasky-research.net Never play leapfrog with a unicorn. ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [2.4.17/18pre] VM and swap - it's really unusable 2002-01-05 12:59 ` M. Edward (Ed) Borasky 2002-01-05 15:09 ` Andreas Hartmann @ 2002-01-06 15:51 ` vda 2002-01-06 19:16 ` M. Edward (Ed) Borasky 1 sibling, 1 reply; 49+ messages in thread From: vda @ 2002-01-06 15:51 UTC (permalink / raw) To: M. Edward (Ed) Borasky; +Cc: linux-kernel On 5 January 2002 10:59, M. Edward (Ed) Borasky wrote: > OK, here's some pseduo-code for a real-world test case. I haven't had a > chance to code it up, but I'm guessing I know what it's going to do. I'd > *love* to be proved wrong :). > > # build and boot a kernel with "Magic SysRq" turned on > # echo 1 > /proc/sys/kernel/sysrq > # fire up "nice --19 top" as "root" > # read "MemTotal" from /proc/meminfo > > # now start the next two jobs concurrently > > # write a disk file with "MemTotal" data or more in it Like dd if=/dev/zero of=/tmp/file bs=... count=... ? > # perform a 2D in-place FFT of total size at least "MemTotal/2" but less > # than "MemTotal" I'm willing to try. What program can I use for FFT? > What's worse is if the page cache gets so big that the FFT has to start > swapping. For those who aren't familiar with 2D FFTs, they take two > passes over the data. The first pass will be unit strides -- sequential > addresses. But the second pass will be large strides -- a power of two. > That second pass is going to be brutal if every page it hits has to be > swapped in! Can you describe FFT memory access pattern in more detail? I'd like to write a simple testcase with similar 'bad' pattern. -- vda ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [2.4.17/18pre] VM and swap - it's really unusable 2002-01-06 15:51 ` vda @ 2002-01-06 19:16 ` M. Edward (Ed) Borasky 2002-01-06 19:38 ` Alan Cox 0 siblings, 1 reply; 49+ messages in thread From: M. Edward (Ed) Borasky @ 2002-01-06 19:16 UTC (permalink / raw) To: vda@port.imtp.ilyichevsk.odessa.ua; +Cc: linux-kernel On Sun, 6 Jan 2002, vda@port.imtp.ilyichevsk.odessa.ua wrote: > Like dd if=/dev/zero of=/tmp/file bs=... count=... ? > That would do it, but I was trying to give a real-world example from image processing, like copying a large image file. > > # perform a 2D in-place FFT of total size at least "MemTotal/2" but less > > # than "MemTotal" > > I'm willing to try. What program can I use for FFT? I use FFTW from http://www.fftw.org. > Can you describe FFT memory access pattern in more detail? > I'd like to write a simple testcase with similar 'bad' pattern. Imagine a 16384 by 16384 array of double complex values. That's a 4 GByte image. Scale down to fit your machine, of course :). The first pass will do an FFT on every row (column) if your language is C (FORTRAN). The "stride" is 16 bytes (one complex value) in the inner loop. Each row (column) is 16384*16 = 262144 bytes long, which works out to 64 pages if the page size is 4096 bytes. Then the second pass will do an FFT on every column (row). The stride is 16384*16 = 262144 bytes. This is a new page for each 16-byte complex value you process :-). That is, all 16384 pages have to be in memory, or swapped into memory if you've run out of real memory and the kernel has swapped them out. Please ... *don't* try to do this on a 512 MB machine and think that an efficient VM is gonna make it work :), -- M. Edward Borasky znmeb@borasky-research.net http://www.borasky-research.net What phrase will you *never* hear Miss Piggy use? "You can't make a silk purse out of a sow's ear!" ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [2.4.17/18pre] VM and swap - it's really unusable 2002-01-06 19:16 ` M. Edward (Ed) Borasky @ 2002-01-06 19:38 ` Alan Cox 2002-01-07 0:47 ` M. Edward Borasky 0 siblings, 1 reply; 49+ messages in thread From: Alan Cox @ 2002-01-06 19:38 UTC (permalink / raw) To: "M. Edward (Ed) Borasky" Cc: vda@port.imtp.ilyichevsk.odessa.ua, linux-kernel > > > That would do it, but I was trying to give a real-world example from > image processing, like copying a large image file. Image processing people use tiling. Try loading a giant image into the gimp and into a non smart application like xpaint. The difference is huge just by careful implementation of the algorithms > Then the second pass will do an FFT on every column (row). The stride is > 16384*16 = 262144 bytes. This is a new page for each 16-byte complex > value you process :-). That is, all 16384 pages have to be in memory, or > swapped into memory if you've run out of real memory and the kernel has > swapped them out. Yes but you don't do it that way, you do stripes of parallel fft computations. We can all write dumb programs that don't behave well with the VM layer. Alan ^ permalink raw reply [flat|nested] 49+ messages in thread
* RE: [2.4.17/18pre] VM and swap - it's really unusable 2002-01-06 19:38 ` Alan Cox @ 2002-01-07 0:47 ` M. Edward Borasky 0 siblings, 0 replies; 49+ messages in thread From: M. Edward Borasky @ 2002-01-07 0:47 UTC (permalink / raw) To: linux-kernel You're right ... no one does an *out-of-core* 2D FFT using VM. What I am saying is that a large page cache can turn an *in-core* 2D FFT -- a 4 GB case on an 8 GB machine, for example -- into an out-of-core one! One other data point: on my stock Red Hat 7.2 box with 512 MB of RAM, I ran a Perl script that builds a 512 MByte hash, a second Perl script which creates a 512 MByte disk file, and the check pass of FFTW concurrently. As I expected, the two Perl scripts competed for RAM and slowed down FFTW. What was even more interesting, though, was that the VM apparently functions correctly in this instance. All three of the processes were getting CPU cycles. And I never saw "kswapd" or "kupdated" take over the system. Although the page cache did get large at one point, once the hash builder got to about 400 MBytes in size, the "cached" piece shrunk to about 10 MBytes and most of the RAM got allocated to the hash builder, as did appropriate amounts of swap. In short, the kernel in Red Hat 7.2 with under 1 GByte of memory is behaving well under memory pressure. It looks like it's kernels beyond that one that have the problems, and also systems with more than 1 GByte. If I had the money, I'd stuff some more RAM in the machine and see if I could isolate this a little further. If anyone wants my Perl scripts, which are trivial, let me know. -- M. Edward Borasky znmeb@borasky-research.net http://www.borasky-research.net ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [2.4.17/18pre] VM and swap - it's really unusable 2002-01-04 20:15 ` Andreas Hartmann 2002-01-04 20:55 ` Stephan von Krawczynski @ 2002-01-05 9:24 ` Petro 2002-01-05 15:44 ` Stephan von Krawczynski 1 sibling, 1 reply; 49+ messages in thread From: Petro @ 2002-01-05 9:24 UTC (permalink / raw) To: Andreas Hartmann; +Cc: Kernel-Mailingliste "We" (Auctionwatch.com) are experiencing problems that appear to be related to VM, I realize that this question was not directed at me: On Fri, Jan 04, 2002 at 09:15:42PM +0100, Andreas Hartmann wrote: > Stephan von Krawczynski wrote: > Question is: which nature is your application / load of the system? You > wrote something about database server. How much rows alltogether? What's Mysql running a dual 650 PIII, 2 gig ram. Rows? Dunno, but several million tables (about 85 gig of tables averaging 45-50k IIRC). > the size of the table(s)? How many concurrent accesses do you have? Do We will have 2-400+ tables open at once. > you do "easy" searches where all of the conditions are located in the > index? How big is your index? How big is the throughput of your > database? Do you have your tables on raw partitions (without caching; as > you can do it with UDB)? I don't know much about the specific design, other than I've been told it's non-optimal. > How big are the partitions you are mounting at once? In my case, all the > partitions together have about 70GB (all reiserfs). One 250G logical volume, a couple smaller ones (3 gig, 30 gig). -- Share and Enjoy. ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [2.4.17/18pre] VM and swap - it's really unusable 2002-01-05 9:24 ` Petro @ 2002-01-05 15:44 ` Stephan von Krawczynski 2002-01-07 7:15 ` Petro 0 siblings, 1 reply; 49+ messages in thread From: Stephan von Krawczynski @ 2002-01-05 15:44 UTC (permalink / raw) To: Petro; +Cc: andihartmann, linux-kernel On Sat, 5 Jan 2002 01:24:42 -0800 Petro <petro@auctionwatch.com> wrote: > "We" (Auctionwatch.com) are experiencing problems that appear to be > related to VM, I realize that this question was not directed at me: And how exactly do the problems look like? Regards, Stephan ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [2.4.17/18pre] VM and swap - it's really unusable 2002-01-05 15:44 ` Stephan von Krawczynski @ 2002-01-07 7:15 ` Petro 2002-01-07 14:33 ` Stephan von Krawczynski 0 siblings, 1 reply; 49+ messages in thread From: Petro @ 2002-01-07 7:15 UTC (permalink / raw) To: Stephan von Krawczynski; +Cc: andihartmann, linux-kernel On Sat, Jan 05, 2002 at 04:44:05PM +0100, Stephan von Krawczynski wrote: > On Sat, 5 Jan 2002 01:24:42 -0800 > Petro <petro@auctionwatch.com> wrote: > > > "We" (Auctionwatch.com) are experiencing problems that appear to be > > related to VM, I realize that this question was not directed at me: > > And how exactly do the problems look like? After some time, ranging from 1 to 48 hours, mysql quits in an unclean fashion (dies leaving tables improperly closed) with a dump in the mysql log file that looks like: > Here is the stack dump: > 0x807b75f handle_segfault__Fi + 383 > 0x812bcaa pthread_sighandler + 154 > 0x815059c chunk_free + 596 > 0x8152573 free + 155 > 0x811579c my_no_flags_free + 16 > 0x80764d5 _._5ilink + 61 > 0x807b48d end_thread__FP3THDb + 53 > 0x80809cc handle_one_connection__FPv + 996 Which the Mysql support team says appears to be memory corruption. Since this has happened on 4 different machines, and one of them had memtest86 run on it (coming up clean), they seem (witness Sasha's post) to think this may have something to do with the memory handling in the kernel. I haven't run it on a kernel that has debugging enabled yet, partially because I've been tracing a completely unrelated problems with our hard drives (IBM GXP 75G drives made in Hungary during the first 3 months of 2001), and partially because the only way to get this to happen is to put the database in production, which results in a crash, which takes our site offline, which costs us money and pisses off our users. Right now we're running on a sun e4500, and it's stable, so until we get the other problem worked out, we're waiting to see on this one. -- Share and Enjoy. ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [2.4.17/18pre] VM and swap - it's really unusable 2002-01-07 7:15 ` Petro @ 2002-01-07 14:33 ` Stephan von Krawczynski 2002-01-07 20:29 ` Petro 0 siblings, 1 reply; 49+ messages in thread From: Stephan von Krawczynski @ 2002-01-07 14:33 UTC (permalink / raw) To: Petro; +Cc: andihartmann, linux-kernel On Sun, 6 Jan 2002 23:15:31 -0800 Petro <petro@auctionwatch.com> wrote: > On Sat, Jan 05, 2002 at 04:44:05PM +0100, Stephan von Krawczynski wrote: > > On Sat, 5 Jan 2002 01:24:42 -0800 > > Petro <petro@auctionwatch.com> wrote: > > > > > "We" (Auctionwatch.com) are experiencing problems that appear to be > > > related to VM, I realize that this question was not directed at me: > > > > And how exactly do the problems look like? > > After some time, ranging from 1 to 48 hours, mysql quits in an > unclean fashion (dies leaving tables improperly closed) with a dump > in the mysql log file that looks like: mysql question: is this a binary from some distro or self-compiled? If self-compiled can you show your ./configure paras, please? > Which the Mysql support team says appears to be memory corruption. > Since this has happened on 4 different machines, and one of them had > memtest86 run on it (coming up clean), they seem (witness Sasha's > post) to think this may have something to do with the memory > handling in the kernel. There is a big difference between memory _corruption_ and a VM deficiency. No app can cope with a _corruption_ and is perfectly allowed to core dump or exit (or trash your disk). But this should not happen on allocation failures. Unless all your RAM is from the same series I do not really believe in mem corruption. I would try Martins small VM patch, as it looks like being a bit more efficient in low mem conditions and this may well be the case you are running into. This means 2.4.17 standard + patch. Regards, Stephan ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [2.4.17/18pre] VM and swap - it's really unusable 2002-01-07 14:33 ` Stephan von Krawczynski @ 2002-01-07 20:29 ` Petro 2002-01-08 1:43 ` Stephan von Krawczynski 0 siblings, 1 reply; 49+ messages in thread From: Petro @ 2002-01-07 20:29 UTC (permalink / raw) To: Stephan von Krawczynski; +Cc: andihartmann, linux-kernel On Mon, Jan 07, 2002 at 03:33:48PM +0100, Stephan von Krawczynski wrote: > On Sun, 6 Jan 2002 23:15:31 -0800 > Petro <petro@auctionwatch.com> wrote: > > On Sat, Jan 05, 2002 at 04:44:05PM +0100, Stephan von Krawczynski wrote: > > > On Sat, 5 Jan 2002 01:24:42 -0800 > > > Petro <petro@auctionwatch.com> wrote: > > > > "We" (Auctionwatch.com) are experiencing problems that appear to be > > > > related to VM, I realize that this question was not directed at me: > > > And how exactly do the problems look like? > > After some time, ranging from 1 to 48 hours, mysql quits in an > > unclean fashion (dies leaving tables improperly closed) with a dump > > in the mysql log file that looks like: > mysql question: is this a binary from some distro or self-compiled? If > self-compiled can you show your ./configure paras, please? It's the binary from mysql.com. > > Which the Mysql support team says appears to be memory corruption. > > Since this has happened on 4 different machines, and one of them had > > memtest86 run on it (coming up clean), they seem (witness Sasha's > > post) to think this may have something to do with the memory > > handling in the kernel. > There is a big difference between memory _corruption_ and a VM deficiency. No > app can cope with a _corruption_ and is perfectly allowed to core dump or exit > (or trash your disk). But this should not happen on allocation failures. > Unless all your RAM is from the same series I do not really believe in mem > corruption. I would try Martins small VM patch, as it looks like being a bit > more efficient in low mem conditions and this may well be the case you are > running into. This means 2.4.17 standard + patch. Is there a reasonable chance that martins patch will get mainlined in the near future? One of the big reasons I chose to upgrade to a later kernel version (from 2.4.8ac<something>+LVMpatches+...) was to get away from having to apply patches (and document which patches and where to get them etc). If this is the route I have to go, I'll do it but, well, I'm not that comfortable with it. -- Share and Enjoy. ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [2.4.17/18pre] VM and swap - it's really unusable 2002-01-07 20:29 ` Petro @ 2002-01-08 1:43 ` Stephan von Krawczynski 2002-01-08 3:10 ` Petro 0 siblings, 1 reply; 49+ messages in thread From: Stephan von Krawczynski @ 2002-01-08 1:43 UTC (permalink / raw) To: Petro; +Cc: andihartmann, linux-kernel > On Mon, Jan 07, 2002 at 03:33:48PM +0100, Stephan von Krawczynski wrote: > > mysql question: is this a binary from some distro or self-compiled? If > > self-compiled can you show your ./configure paras, please? > > It's the binary from mysql.com. Beta or stable release? > > [...] I would try Martins small VM patch, as it looks like being a bit > > more efficient in low mem conditions and this may well be the case you are > > running into. This means 2.4.17 standard + patch. > > Is there a reasonable chance that martins patch will get mainlined > in the near future? I really can't know. But to me the results look interesting enough to give it a try on certain problem situations (like yours) to find out if it is any better than the stock version. If you and others can confirm that things get better then I have no real doubts that Marcelo can pick it up. > One of the big reasons I chose to upgrade to a > later kernel version (from 2.4.8ac<something>+LVMpatches+...) was > to get away from having to apply patches (and document which > patches and where to get them etc). Well, there is really nothing wrong with upgrading mainline kernels, as the are getting better with every release, so I would always suggest to take the releases up lets say a week after being out. Only your situation maybe can help to improve more, if you input some of your experiences in LKML with a patch like Martins. Feedback _is_ required to find a solution to an existing problem. > If this is the route I have to go, I'll do it but, well, I'm not > that comfortable with it. Well, my suggestions: don't patch around too much, but try single patches on stock kernel and evaluate them here. Regards, Stephan ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [2.4.17/18pre] VM and swap - it's really unusable 2002-01-08 1:43 ` Stephan von Krawczynski @ 2002-01-08 3:10 ` Petro 2002-01-08 6:00 ` Petro 0 siblings, 1 reply; 49+ messages in thread From: Petro @ 2002-01-08 3:10 UTC (permalink / raw) To: Stephan von Krawczynski; +Cc: andihartmann, linux-kernel On Tue, Jan 08, 2002 at 02:43:42AM +0100, Stephan von Krawczynski wrote: > > On Mon, Jan 07, 2002 at 03:33:48PM +0100, Stephan von Krawczynski > wrote: > > > mysql question: is this a binary from some distro or > self-compiled? If > > > self-compiled can you show your ./configure paras, please? > > > > It's the binary from mysql.com. > > Beta or stable release? Stable. > > > more efficient in low mem conditions and this may well be the case you are > > > running into. This means 2.4.17 standard + patch. > > Is there a reasonable chance that martins patch will get mainlined > > in the near future? > > I really can't know. But to me the results look interesting enough to > give it a try on certain problem situations (like yours) to find out > if it is any better than the stock version. If you and others can > confirm that things get better then I have no real doubts that Marcelo > can pick it up. Out of ignorance and laziness, where is it again that I can get this kernel? > > One of the big reasons I chose to upgrade to a > > later kernel version (from 2.4.8ac<something>+LVMpatches+...) > was > > to get away from having to apply patches (and document which > > patches and where to get them etc). > > Well, there is really nothing wrong with upgrading mainline kernels, Funny, I went from a working 2.4.8-ac<x> to a non-working 2.4.13+patches when I started getting these crashes. At first I thought they were Mysql, so I called them. They said "Re-install windows", er, I mean upgrade my kernel to 2.4.16, which would "fix the problem", so I did, and it didn't. So they said to go to 2.4.17rc2 as that would fix my problem, only it didn't. > as the are getting better with every release, so I would always > suggest to take the releases up lets say a week after being out. Only Yeah, and build a debian package, distribute it to (looks behind me) 100+ linux servers, including 6 mission critical heavily loaded DB machines. Not to be a complete asswipe, but no. While I like playing with computers and all that, I don't have enough hours in the day to be rolling out new kernels every couple weeks and still have time left over to see my wife, shoot my guns, ride my motorcycles and drink my scotch. > your situation maybe can help to improve more, if you input some of > your experiences in LKML with a patch like Martins. Feedback _is_ > required to find a solution to an existing problem. I understand completely, I'm just trying to figure out a way to test this that doesn't impact my site as drastically. See, we've only got two databases that will cause this fault, and of course they are the two most important ones, and the only way we can generate this fault is to put them live and wait for them to crash. > > If this is the route I have to go, I'll do it but, well, I'm > not > > that comfortable with it. > > Well, my suggestions: don't patch around too much, but try single > patches on stock kernel and evaluate them here. There are 2 other patches I need to apply, the first is the LVM 1.0.1 patch, and the second is the VFS-lock patch. We need these to do snapshots. Which isn't bad, but I'm about the only one still here who can do it (violates hit-by-a-bus rule). -- Share and Enjoy. ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [2.4.17/18pre] VM and swap - it's really unusable 2002-01-08 3:10 ` Petro @ 2002-01-08 6:00 ` Petro 0 siblings, 0 replies; 49+ messages in thread From: Petro @ 2002-01-08 6:00 UTC (permalink / raw) To: Stephan von Krawczynski; +Cc: andihartmann, linux-kernel On Mon, Jan 07, 2002 at 07:10:01PM -0800, Petro wrote: > > can pick it up. > > Out of ignorance and laziness, where is it again that I can get this > kernel? Let me rephrase that. Out of ignorance and laziness, exactly which patch is it that I need, and where can I find it? -- Share and Enjoy. ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [2.4.17/18pre] VM and swap - it's really unusable 2002-01-03 20:23 ` Ken Brownfield ` (2 preceding siblings ...) 2002-01-04 0:19 ` Stephan von Krawczynski @ 2002-01-11 20:41 ` Ken Brownfield 2002-01-11 21:13 ` Mark Hahn ` (2 more replies) 3 siblings, 3 replies; 49+ messages in thread From: Ken Brownfield @ 2002-01-11 20:41 UTC (permalink / raw) To: vanl; +Cc: linux-kernel After more testing, my original observations seem to be holding up, except that under heavy VM load (e.g., "make -j bzImage") the machine's overall performance seems far lower. For instance, without the patch the -j build finishes in ~10 minutes (2x933P3/256MB) but with the patch I haven't had the patience to let it finish after more than an hour. This is perhaps because the vmscan patch is too aggressively shrinking the caches, or causing thrashing in another area? I'm also noticing that the amount of swap used is nearly an order of magnitude higher, which doesn't make sense at first glance... Also, there are extended periods where idle CPU is 50-80%. Maybe the patch or at least its intent can be merged with Andrea's work if applicable? Thanks, -- Ken. brownfld@irridia.com On Thu, Jan 03, 2002 at 02:23:01PM -0600, Ken Brownfield wrote: | Unfortunately, I lost the response that basically said "2.4 looks stable | to me", but let me count the ways in which I agree with Andreas' | sentiment: | | A) VM has major issues | 1) about a dozen recent OOPS reports in VM code | 2) VM falls down on large-memory machines with a | high inode count (slocate/updatedb, i/dcache) | 3) Memory allocation failures and OOM triggers | even though caches remain full. | 4) Other bugs fixed in -aa and others | B) Live- and dead-locks that I'm seeing on all 2.4 production | machines > 2.4.9, possibly related to A. But how will I | ever find out? | C) IO-APIC code that requires noapic on any and all SMP | machines that I've ever run on. | | I don't have anything against anyone here -- I think everyone is doing a | fine job. It's an issue of acceptance of the problem and focus. These | issues are all showstoppers for me, and while I don't represent the 90% | of the Linux market that is UP desktops, IMHO future work on the kernel | will be degraded by basic functionality that continues to cause | problems. | | I think seeing some of Andrea's and Andrew's et al patches actually | *happen* would be a good thing, since 2.4 kernels are decidedly not | ready for production here. I am forced to apply 26 distinct patch sets | to my kernels, and I am NOT the right person to make these judgements. | Which is why I was interested in an LKML summary source, though I | haven't yet had a chance to catch up on that thread of comment. | | Having a glitch in the radeon driver is one thing; having persistent, | fatal, and reproducable failures in universal kernel code is entirely | another. | | -- | Ken. | brownfld@irridia.com | | | On Fri, Dec 28, 2001 at 09:16:38PM +0100, Andreas Hartmann wrote: | | Hello all, | | | | Again, I did a rsync-operation as described in | | "[2.4.17rc1] Swapping" MID <3C1F4014.2010705@athlon.maya.org>. | | | | This time, the kernel had a swappartition which was about 200MB. As the | | swap-partition was fully used, the kernel killed all processes of knode. | | Nearly 50% of RAM had been used for buffers at this moment. Why is there | | so much memory used for buffers? | | | | I know I repeat it, but please: | | | | Fix the VM-management in kernel 2.4.x. It's unusable. Believe | | me! As comparison: kernel 2.2.19 didn't need nearly any swap for | | the same operation! | | | | Please consider that I'm using 512 MB of RAM. This should, or better: | | must be enough to do the rsync-operation nearly without any swapping - | | kernel 2.2.19 does it! | | | | The performance of kernel 2.4.18pre1 is very poor, which is no surprise, | | because the machine swaps nearly nonstop. | | | | | | Regards, | | Andreas Hartmann | | | | - | | To unsubscribe from this list: send the line "unsubscribe linux-kernel" in | | the body of a message to majordomo@vger.kernel.org | | More majordomo info at http://vger.kernel.org/majordomo-info.html | | Please read the FAQ at http://www.tux.org/lkml/ | - | To unsubscribe from this list: send the line "unsubscribe linux-kernel" in | the body of a message to majordomo@vger.kernel.org | More majordomo info at http://vger.kernel.org/majordomo-info.html | Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [2.4.17/18pre] VM and swap - it's really unusable 2002-01-11 20:41 ` Ken Brownfield @ 2002-01-11 21:13 ` Mark Hahn 2002-01-11 21:38 ` Ken Brownfield 2002-01-11 23:38 ` Rik van Riel 2002-01-11 21:23 ` Ken Brownfield 2002-01-12 0:13 ` M.H.VanLeeuwen 2 siblings, 2 replies; 49+ messages in thread From: Mark Hahn @ 2002-01-11 21:13 UTC (permalink / raw) To: linux-kernel > overall performance seems far lower. For instance, without the patch > the -j build finishes in ~10 minutes (2x933P3/256MB) but with the patch please, PLEASE stop using "make -j" for anything except the fork-bomb that it is. pretending that it's a benchmark, especially one to guide kernel tuning, is a travesty! if you want to simulate VM load, so something sane like boot with mem=32M, or a simple "mmap(lots); mlockall" tool. regards, mark hahn. ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [2.4.17/18pre] VM and swap - it's really unusable 2002-01-11 21:13 ` Mark Hahn @ 2002-01-11 21:38 ` Ken Brownfield 2002-01-11 23:38 ` Rik van Riel 1 sibling, 0 replies; 49+ messages in thread From: Ken Brownfield @ 2002-01-11 21:38 UTC (permalink / raw) To: Mark Hahn; +Cc: linux-kernel I don't think I made the claim that this was a benchmark -- I certainly realize that "make -j bzImage" is not real-world, but it is clearly indicative of heavy VM/CPU/context load. Since I don't believe this patch is currently in the running for inclusion, I'm just giving general feedback to the patch author rather than making a case. For instance, "make -j bzImage" reproduced the ext3 bug that Andrew found where my other VM-intensive apps did not. I doubt we should keep the bug in the kernel because the situation isn't real-world enough. But yes, a bug is worse than a behavior flaw, granted. -- Ken. brownfld@irridia.com On Fri, Jan 11, 2002 at 04:13:00PM -0500, Mark Hahn wrote: | > overall performance seems far lower. For instance, without the patch | > the -j build finishes in ~10 minutes (2x933P3/256MB) but with the patch | | please, PLEASE stop using "make -j" | for anything except the fork-bomb that it is. | pretending that it's a benchmark, especially one | to guide kernel tuning, is a travesty! | | if you want to simulate VM load, so something sane like | boot with mem=32M, or a simple "mmap(lots); mlockall" tool. | | regards, mark hahn. | | - | To unsubscribe from this list: send the line "unsubscribe linux-kernel" in | the body of a message to majordomo@vger.kernel.org | More majordomo info at http://vger.kernel.org/majordomo-info.html | Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [2.4.17/18pre] VM and swap - it's really unusable 2002-01-11 21:13 ` Mark Hahn 2002-01-11 21:38 ` Ken Brownfield @ 2002-01-11 23:38 ` Rik van Riel 1 sibling, 0 replies; 49+ messages in thread From: Rik van Riel @ 2002-01-11 23:38 UTC (permalink / raw) To: Mark Hahn; +Cc: linux-kernel On Fri, 11 Jan 2002, Mark Hahn wrote: > > overall performance seems far lower. For instance, without the patch > > the -j build finishes in ~10 minutes (2x933P3/256MB) but with the patch > > please, PLEASE stop using "make -j" > for anything except the fork-bomb that it is. > pretending that it's a benchmark, especially one > to guide kernel tuning, is a travesty! Actually, it's as good a benchmark as any. Knowing how well the system is able to recover from heavy overload situations is useful to know if your server gets heavily overloaded at times. If one VM falls over horribly under half the load it takes to make another VM go slower, I know which one I'd want on my server. > if you want to simulate VM load, so something sane like > boot with mem=32M, or a simple "mmap(lots); mlockall" tool. ... and then you come up with something WAY less realistic than 'make -j' ;))) cheers, Rik -- "Linux holds advantages over the single-vendor commercial OS" -- Microsoft's "Competing with Linux" document http://www.surriel.com/ http://distro.conectiva.com/ ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [2.4.17/18pre] VM and swap - it's really unusable 2002-01-11 20:41 ` Ken Brownfield 2002-01-11 21:13 ` Mark Hahn @ 2002-01-11 21:23 ` Ken Brownfield 2002-01-12 0:13 ` M.H.VanLeeuwen 2 siblings, 0 replies; 49+ messages in thread From: Ken Brownfield @ 2002-01-11 21:23 UTC (permalink / raw) To: vanl; +Cc: linux-kernel Andrew Morton kindly pointed out that my crack pipe is dangerously empty and I didn't specify what patch I was talking about. In my defense, I was up all last night tracking down the ext3 bug that Andrew fixed right under me. ;) I replied to the wrong message, which I've pasted below. This is wrt Martin's VM patch per the previous discussion. Apologies, -- Ken. brownfld@irridia.com On Fri, Jan 11, 2002 at 02:41:17PM -0600, Ken Brownfield wrote: | After more testing, my original observations seem to be holding up, | except that under heavy VM load (e.g., "make -j bzImage") the machine's | overall performance seems far lower. For instance, without the patch | the -j build finishes in ~10 minutes (2x933P3/256MB) but with the patch | I haven't had the patience to let it finish after more than an hour. | | This is perhaps because the vmscan patch is too aggressively shrinking | the caches, or causing thrashing in another area? I'm also noticing | that the amount of swap used is nearly an order of magnitude higher, | which doesn't make sense at first glance... Also, there are extended | periods where idle CPU is 50-80%. | | Maybe the patch or at least its intent can be merged with Andrea's work | if applicable? | | Thanks, | -- | Ken. | brownfld@irridia.com What I SHOULD have replied to: | Date: Tue, 8 Jan 2002 09:19:57 -0600 | From: Ken Brownfield <brownfld@irridia.com> | To: Stephan von Krawczynski <skraw@ithnet.com>, | "M.H.VanLeeuwen" <vanl@megsinet.net>, akpm@zip.com.au | Cc: linux-kernel@vger.kernel.org | Subject: Update Re: [2.4.17/18pre] VM and swap - it's really unusable | User-Agent: Mutt/1.2.5.1i | In-Reply-To: <20020105160833.0800a182.skraw@ithnet.com>; from skraw@ithnet.com o | n Sat, Jan 05, 2002 at 04:08:33PM +0100 | Precedence: bulk | X-Mailing-List: linux-kernel@vger.kernel.org | | I stayed at work all night banging out tests on a few of our machines | here. I took 2.4.18-pre2 and 2.4.18-pre2 with the vmscan patch from | "M.H.VanLeeuwen" <vanl@megsinet.net>. | | My sustained test consisted of this type of load: | | ls -lR / > /dev/null & | /usr/bin/slocate -u -f "nfs,smbfs,ncpfs,proc,devpts" -e "/tmp,/var/tmp,/ | usr/tmp,/afs,/net" & | dd if=/dev/sda3 of=/sda3 bs=1024k & | # Hit TUX on this machine repeatedly; html page with 1000 images | # Wait for memory to be mostly used by buff/page cache | ./a.out & | # repeat finished commands -- keep all commands running | # after a.out finishes, alow buff/page to refill before repeating | | The a.out in this case is a little program (attached, c.c) to allocate | and write to an amount of memory equal to physical RAM. The example I | chose below is from a 2xP3/600 with 1GB of RAM and 2GB swap. | | This was not a formal benchmark -- I think benchmarks have been | presented before by other folks, and looking at benchmarks does not | necessarily indicate the real-world problems that exist. My intent was | to reproduce the issues I've been seeing, and then apply the MH (and | only the MH) patch and observe. | | 2.4.18-pre2 | | Once slocate starts and gets close to filling RAM with buffer/page | cache, kupdated and kswapd have periodic spikes of 50-100% CPU. | | When a.out starts, kswapd and kupdated begin to eat significant portions | of CPU (20-100%) and I/O becomes more and more sluggish as a.out | allocates. | | When a.out uses all free RAM and should begin eating cache, significant | swapping begins and cache is not decreased significantly until the | machine goes 100-200MB into swap. | | Here are two readprofile outputs, sorted by ticks and load. | | 229689 default_idle 4417.0962 | 4794 file_read_actor 18.4385 | 405 __rdtsc_delay 14.4643 | 3763 do_anonymous_page 14.0410 | 3796 statm_pgd_range 9.7835 | 1535 prune_icache 6.9773 | 153 __free_pages 4.7812 | 1420 create_bounce 4.1765 | 583 sym53c8xx_intr 3.9392 | 221 atomic_dec_and_lock 2.7625 | 5214 generic_file_write 2.5659 | | 273464 total 0.1903 | 234168 default_idle 4503.2308 | 5298 generic_file_write 2.6073 | 4868 file_read_actor 18.7231 | 3799 statm_pgd_range 9.7912 | 3763 do_anonymous_page 14.0410 | 1535 prune_icache 6.9773 | 1526 shrink_cache 1.6234 | 1469 create_bounce 4.3206 | 643 rmqueue 1.1320 | 591 sym53c8xx_intr 3.9932 | 505 __make_request 0.2902 | | | 2.4.18-pre2 with MH | | With the MH patch applied, the issues I witnessed above did not seem to | reproduce. Memory allocation under pressure seemed faster and smoother. | kswapd never went above 5-15% CPU. When a.out allocated memory, it did | not begin swapping until buffer/page cache had been nearly completely | cannibalized. | | And when a.out caused swapping, it was controlled and behaved like you | would expect the VM to bahave -- slowly swapping out unused pages | instead of large swap write-outs without the patch. | | Martin, have you done throughput benchmarks with MH/rmap/aa, BTW? | | But both kernels still seem to be sluggish when it comes to doing small | I/O operations (vi, ls, etc) during heavy swapping activity. | | Here are the readprofile results: | | 206243 default_idle 3966.2115 | 6486 file_read_actor 24.9462 | 409 __rdtsc_delay 14.6071 | 2798 do_anonymous_page 10.4403 | 185 __free_pages 5.7812 | 1846 statm_pgd_range 4.7577 | 469 sym53c8xx_intr 3.1689 | 176 atomic_dec_and_lock 2.2000 | 349 end_buffer_io_async 1.9830 | 492 refill_inactive 1.8358 | 94 system_call 1.8077 | | 245776 total 0.1710 | 216238 default_idle 4158.4231 | 6486 file_read_actor 24.9462 | 2799 do_anonymous_page 10.4440 | 1855 statm_pgd_range 4.7809 | 1611 generic_file_write 0.7928 | 839 __make_request 0.4822 | 820 shrink_cache 0.7374 | 540 rmqueue 0.9507 | 534 create_bounce 1.5706 | 492 refill_inactive 1.8358 | 487 sym53c8xx_intr 3.2905 | | | There may be significant differences in the profile outputs for those | with VM fu. | | Summary: MH swaps _after_ cache has been properly cannibalized, and | swapping activity starts when expected and is properly throttled. | kswapd and kupdated don't seem to go into berserk 100% CPU mode. | | At any rate, I now have the MH patch (and Andrew Morton's mini-ll and | read-latency2 patches) in production, and I like what I see so far. I'd | vote for them to go into 2.4.18, IMHO. Maybe the full low-latency patch | if it's not truly 2.5 material. | | My next cook-off will be with -aa and rmap, although if the rather small | MH patch fixes my last issues it may be worth putting all VM effort into | a 2.5 VM cook-off. :) Hopefully the useful stuff in -aa can get pulled | in at some point soon, though. | | Thanks much to Martin H. VanLeeuwen for his patch and Stephan von | Krawczynski for his recommendations. I'll let MH cook for a while and | I'll follow up later. | -- | Ken. | brownfld@irridia.com | | c.c: | | #include <stdio.h> | | #define MB_OF_RAM 1024 | | int | main() | { | long stuffsize = MB_OF_RAM * 1048576 ; | char *stuff ; | | if ( stuff = (char *)malloc( stuffsize ) ) { | long chunksize = 1048576 ; | long c ; | | for ( c=0 ; c<chunksize ; c++ ) | *(stuff+c) = '\0' ; | /* hack; last chunk discarded if stuffsize%chunksize != 0 */ | for ( ; (c+chunksize)<stuffsize ; c+=chunksize ) | memcpy( stuff+c, stuff, chunksize ); | | sleep( 120 ); | } | else | printf("OOPS\n"); | | exit( 0 ); | } | | | On Sat, Jan 05, 2002 at 04:08:33PM +0100, Stephan von Krawczynski wrote: | [...] | | I am pretty impressed by Martins test case where merely all VM patches fail | | with the exception of his own :-) The thing is, this test is not of nature | | "very special" but more like "system driven to limit by normal processes". And | | this is the real interesting part about it. | [...] | ^ permalink raw reply [flat|nested] 49+ messages in thread
* Re: [2.4.17/18pre] VM and swap - it's really unusable 2002-01-11 20:41 ` Ken Brownfield 2002-01-11 21:13 ` Mark Hahn 2002-01-11 21:23 ` Ken Brownfield @ 2002-01-12 0:13 ` M.H.VanLeeuwen 2 siblings, 0 replies; 49+ messages in thread From: M.H.VanLeeuwen @ 2002-01-12 0:13 UTC (permalink / raw) To: Ken Brownfield; +Cc: linux-kernel [-- Attachment #1: Type: text/plain, Size: 1395 bytes --] Ken Brownfield wrote: > > After more testing, my original observations seem to be holding up, > except that under heavy VM load (e.g., "make -j bzImage") the machine's > overall performance seems far lower. For instance, without the patch > the -j build finishes in ~10 minutes (2x933P3/256MB) but with the patch > I haven't had the patience to let it finish after more than an hour. > > This is perhaps because the vmscan patch is too aggressively shrinking > the caches, or causing thrashing in another area? I'm also noticing > that the amount of swap used is nearly an order of magnitude higher, > which doesn't make sense at first glance... Also, there are extended > periods where idle CPU is 50-80%. > > Maybe the patch or at least its intent can be merged with Andrea's work > if applicable? > > Thanks, > -- > Ken. > brownfld@irridia.com > Ken, Attached is an update to my previous vmscan.patch.2.4.17.c Version "d" fixes a BUG due to a race in the old code _and_ is much less agressive at cache_shrinkage or conversely more willing to swap out but not as much as the stock kernel. It continues to work well wrt to high vm pressure. Give it a whirl to see if it changes your "-j" symptoms. If you like you can change the one line in the patch from "DEF_PRIORITY" which is "6" to progressively smaller values to "tune" whatever kind of swap_out behaviour you like. Martin [-- Attachment #2: vmscan.patch.2.4.17.d --] [-- Type: application/octet-stream, Size: 1325 bytes --] --- linux.virgin/mm/vmscan.c Mon Dec 31 12:46:25 2001 +++ linux/mm/vmscan.c Fri Jan 11 18:03:05 2002 @@ -394,9 +394,9 @@ if (PageDirty(page) && is_page_cache_freeable(page) && page->mapping) { /* * It is not critical here to write it only if - * the page is unmapped beause any direct writer + * the page is unmapped because any direct writer * like O_DIRECT would set the PG_dirty bitflag - * on the phisical page after having successfully + * on the physical page after having successfully * pinned it and after the I/O to the page is finished, * so the direct writes to the page cannot get lost. */ @@ -480,11 +480,14 @@ /* * Alert! We've found too many mapped pages on the - * inactive list, so we start swapping out now! + * inactive list. + * Move referenced pages to the active list. */ - spin_unlock(&pagemap_lru_lock); - swap_out(priority, gfp_mask, classzone); - return nr_pages; + if (PageReferenced(page) && !PageLocked(page)) { + del_page_from_inactive_list(page); + add_page_to_active_list(page); + } + continue; } /* @@ -521,6 +524,9 @@ } spin_unlock(&pagemap_lru_lock); + if (max_mapped <= 0 && (nr_pages > 0 || priority < DEF_PRIORITY)) + swap_out(priority, gfp_mask, classzone); + return nr_pages; } ^ permalink raw reply [flat|nested] 49+ messages in thread
end of thread, other threads:[~2002-01-12 0:13 UTC | newest]
Thread overview: 49+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2001-12-28 20:16 [2.4.17/18pre] VM and swap - it's really unusable Andreas Hartmann
2001-12-28 20:32 ` Rik van Riel
[not found] ` <3C2CD9EC.1D6C798E@zip.com.au>
2001-12-28 21:26 ` Andreas Hartmann
2001-12-29 0:30 ` Alan Cox
2001-12-29 13:14 ` Andreas Hartmann
2001-12-29 15:15 ` Andrea Arcangeli
2002-01-03 20:23 ` Ken Brownfield
2002-01-03 20:50 ` Rik van Riel
2002-01-03 21:54 ` Andrew Morton
2002-01-04 4:56 ` Ken Brownfield
2002-01-04 0:19 ` Stephan von Krawczynski
2002-01-04 5:26 ` Ken Brownfield
2002-01-04 8:06 ` Ville Herva
2002-01-04 13:05 ` Stephan von Krawczynski
2002-01-04 13:03 ` Stephan von Krawczynski
2002-01-04 23:50 ` Ken Brownfield
2002-01-05 15:08 ` Stephan von Krawczynski
2002-01-05 21:40 ` Ken Brownfield
2002-01-06 15:48 ` Stephan von Krawczynski
2002-01-08 5:09 ` Ken Brownfield
2002-01-07 1:42 ` Rik van Riel
2002-01-07 2:22 ` Rik van Riel
2002-01-07 14:20 ` Stephan von Krawczynski
2002-01-08 0:36 ` Rik van Riel
2002-01-08 15:19 ` Update " Ken Brownfield
2002-01-04 20:15 ` Andreas Hartmann
2002-01-04 20:55 ` Stephan von Krawczynski
2002-01-05 8:39 ` Andreas Hartmann
2002-01-05 12:59 ` M. Edward (Ed) Borasky
2002-01-05 15:09 ` Andreas Hartmann
2002-01-05 17:51 ` M. Edward (Ed) Borasky
2002-01-06 15:51 ` vda
2002-01-06 19:16 ` M. Edward (Ed) Borasky
2002-01-06 19:38 ` Alan Cox
2002-01-07 0:47 ` M. Edward Borasky
2002-01-05 9:24 ` Petro
2002-01-05 15:44 ` Stephan von Krawczynski
2002-01-07 7:15 ` Petro
2002-01-07 14:33 ` Stephan von Krawczynski
2002-01-07 20:29 ` Petro
2002-01-08 1:43 ` Stephan von Krawczynski
2002-01-08 3:10 ` Petro
2002-01-08 6:00 ` Petro
2002-01-11 20:41 ` Ken Brownfield
2002-01-11 21:13 ` Mark Hahn
2002-01-11 21:38 ` Ken Brownfield
2002-01-11 23:38 ` Rik van Riel
2002-01-11 21:23 ` Ken Brownfield
2002-01-12 0:13 ` M.H.VanLeeuwen
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox