[2.4.17/18pre] VM and swap - it's really unusable

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* [2.4.17/18pre] VM and swap - it's really unusable
@ 2001-12-28 20:16 Andreas Hartmann
  2001-12-28 20:32 ` Rik van Riel
                   ` (5 more replies)
  0 siblings, 6 replies; 49+ messages in thread
From: Andreas Hartmann @ 2001-12-28 20:16 UTC (permalink / raw)
  To: Kernel-Mailingliste

Hello all,

Again, I did a rsync-operation as described in
"[2.4.17rc1] Swapping" MID <3C1F4014.2010705@athlon.maya.org>.

This time, the kernel had a swappartition which was about 200MB. As the 
swap-partition was fully used, the kernel killed all processes of knode.
Nearly 50% of RAM had been used for buffers at this moment. Why is there 
so much memory used for buffers?

I know I repeat it, but please:

	Fix the VM-management in kernel 2.4.x. It's unusable. Believe
	me! As comparison: kernel 2.2.19 didn't need nearly any swap for
	the same operation!

Please consider that I'm using 512 MB of RAM. This should, or better: 
must be enough to do the rsync-operation nearly without any swapping - 
kernel 2.2.19 does it!

The performance of kernel 2.4.18pre1 is very poor, which is no surprise, 
because the machine swaps nearly nonstop.

Regards,
Andreas Hartmann

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [2.4.17/18pre] VM and swap - it's really unusable
  2001-12-28 20:16 [2.4.17/18pre] VM and swap - it's really unusable Andreas Hartmann
@ 2001-12-28 20:32 ` Rik van Riel
       [not found] ` <3C2CD9EC.1D6C798E@zip.com.au>
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 49+ messages in thread
From: Rik van Riel @ 2001-12-28 20:32 UTC (permalink / raw)
  To: Andreas Hartmann; +Cc: linux-kernel

On Fri, 28 Dec 2001, Andreas Hartmann wrote:

> 	Fix the VM-management in kernel 2.4.x. It's unusable. Believe
> 	me! As comparison: kernel 2.2.19 didn't need nearly any swap for
> 	the same operation!

If you feel adventurous you can try my rmap based
VM, the latest version is on:

	http://surriel.com/patches/2.4/2.4.17-rmap-8

This VM should behave a bit better (it does on my machines),
but isn't yet bug-free enough to be used on production machines.
Also, the changes it introduces are, IMHO, too big for a stable
kernel series ;)

regards,

Rik
-- 
DMCA, SSSCA, W3C?  Who cares?  http://thefreeworld.net/

http://www.surriel.com/		http://distro.conectiva.com/


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [2.4.17/18pre] VM and swap - it's really unusable
       [not found] ` <3C2CD9EC.1D6C798E@zip.com.au>
@ 2001-12-28 21:26   ` Andreas Hartmann
  0 siblings, 0 replies; 49+ messages in thread
From: Andreas Hartmann @ 2001-12-28 21:26 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Kernel-Mailingliste

Andrew Morton wrote:

> Andreas Hartmann wrote:
> 
>>Hello all,
>>
>>Again, I did a rsync-operation as described in
>>"[2.4.17rc1] Swapping" MID <3C1F4014.2010705@athlon.maya.org>.
>>
>>This time, the kernel had a swappartition which was about 200MB. As the
>>swap-partition was fully used, the kernel killed all processes of knode.
>>Nearly 50% of RAM had been used for buffers at this moment. Why is there
>>so much memory used for buffers?
>>
> 
> It's very strange.  The large amount of buffercache usage is to
> be expected from statting 20 gigs worth of files, but the kernel
> should (and normally does) free up that memory on demand.
> 
> Which filesystem(s) are you using?
> 
> Are you using NFS/NBD/SMBFS or anything like that?
> 

Basically, I'm using NFS and reiserfs. But I didn't use any file on NFS 
since the last reboot - and the NFS-shares haven't been mounted.

There are 2 IDE-Harddisks in this machine:
hda: WDC WD205AA, ATA DISK drive (40079088 sectors (20520 MB) w/2048KiB
				  cache, CHS=2494/255/63, UDMA(66))
hdb: WDC WD450AA-00BAA0, ATA DISK drive (87930864 sectors (45021 MB)
					w/2048KiB Cache,
					CHS=5473/255/63, UDMA(66))

On hda, I have got 7 partitions (plus one little "boot"-partition, which 
isn't mounted and a 200MB swap partition).
On hdb, I have got 12 partitions and one more, meanwhile 1GB swap partition.
All partitions are formated with reiserfs.


Regards,
Andreas Hartmann


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [2.4.17/18pre] VM and swap - it's really unusable
  2001-12-28 20:16 [2.4.17/18pre] VM and swap - it's really unusable Andreas Hartmann
  2001-12-28 20:32 ` Rik van Riel
       [not found] ` <3C2CD9EC.1D6C798E@zip.com.au>
@ 2001-12-29  0:30 ` Alan Cox
  2001-12-29 13:14 ` Andreas Hartmann
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 49+ messages in thread
From: Alan Cox @ 2001-12-29  0:30 UTC (permalink / raw)
  To: Andreas Hartmann; +Cc: Kernel-Mailingliste

> 	Fix the VM-management in kernel 2.4.x. It's unusable. Believe
> 	me! As comparison: kernel 2.2.19 didn't need nearly any swap for
> 	the same operation!
> The performance of kernel 2.4.18pre1 is very poor, which is no surprise, 
> because the machine swaps nearly nonstop.

Does the 2.4.9 Red Hat kernel (if yoiu are using RH) or 2.4.12-ac8 show the 
same problem ?

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [2.4.17/18pre] VM and swap - it's really unusable
  2001-12-28 20:16 [2.4.17/18pre] VM and swap - it's really unusable Andreas Hartmann
                   ` (2 preceding siblings ...)
  2001-12-29  0:30 ` Alan Cox
@ 2001-12-29 13:14 ` Andreas Hartmann
  2001-12-29 15:15 ` Andrea Arcangeli
  2002-01-03 20:23 ` Ken Brownfield
  5 siblings, 0 replies; 49+ messages in thread
From: Andreas Hartmann @ 2001-12-29 13:14 UTC (permalink / raw)
  To: Kernel-Mailingliste

Andreas Hartmann wrote:

> Hello all,
> 
> Again, I did a rsync-operation as described in
> "[2.4.17rc1] Swapping" MID <3C1F4014.2010705@athlon.maya.org>.
> 

Some other examples:
I just did a
cp -Rd linux-2.4.16 linux-2.4.17
(with object-files). Before starting this action, I had about 120 MB of 
free RAM. During copying - I did nothing else meanwhile, there was 2MB 
swap used - and 12 MB of RAM were free. The biggest part of memory was 
used for caching - what is ok.
After copying, only 10 MB of memory have been given free again. There 
have been 490MB of RAM used now (nearly most for caching).

Outgoing from this situation, I started another little cp-action:
cp -Rd linux-2.4.18pre1 linux-2.4.test
(again including object files).
Result: the swap usage stayed nearly constant, neverthless there have 
been 6 accesses to swap.

Now, I deleted the linux-2.4.test-directory with
rm -R linux-2.4.test
This action was very fast (approximately 1s).

Afterwards, a big part of the cache memory has been given free (about 
100MB). Now, 122MB of RAM have been free again.

Next example (running after the last):
SuSE run-crons have been running. This means:
-> updatedb
-> sort
-> frcode
-> find
-> mandb

47MB swap used, 2/3 of memory is used for buffers (Don't forget: I've 
got 512MB of RAM) and about 30MB of RAM are free.

My observation:
Why does the kernel swap to get free memory for caching / buffering? I 
can't see any sense in this action. Wouldn't it be better to shrink the 
cashing / buffering-RAM to the amount of memory, which is obviously free?

Swapping should be principally used, if the RAM ends for real memory 
(memory, which is used for running applications). First of all, the 
memory-usage of cache and buffers should be reduced before starting to 
swap IMHO.

Or would it be possible, to implement more than one swapping strategy, 
which could be configured during make menuconfig? This would give the 
user the chance to find the best swapping strategy for his purpose.

Regards,
Andreas Hartmann

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [2.4.17/18pre] VM and swap - it's really unusable
  2001-12-28 20:16 [2.4.17/18pre] VM and swap - it's really unusable Andreas Hartmann
                   ` (3 preceding siblings ...)
  2001-12-29 13:14 ` Andreas Hartmann
@ 2001-12-29 15:15 ` Andrea Arcangeli
  2002-01-03 20:23 ` Ken Brownfield
  5 siblings, 0 replies; 49+ messages in thread
From: Andrea Arcangeli @ 2001-12-29 15:15 UTC (permalink / raw)
  To: Andreas Hartmann; +Cc: Kernel-Mailingliste

On Fri, Dec 28, 2001 at 09:16:38PM +0100, Andreas Hartmann wrote:
> Hello all,
> 
> Again, I did a rsync-operation as described in
> "[2.4.17rc1] Swapping" MID <3C1F4014.2010705@athlon.maya.org>.
> 
> This time, the kernel had a swappartition which was about 200MB. As the 
> swap-partition was fully used, the kernel killed all processes of knode.
> Nearly 50% of RAM had been used for buffers at this moment. Why is there 
> so much memory used for buffers?
> 
> I know I repeat it, but please:
> 
> 	Fix the VM-management in kernel 2.4.x. It's unusable. Believe
> 	me! As comparison: kernel 2.2.19 didn't need nearly any swap for
> 	the same operation!
> 
> Please consider that I'm using 512 MB of RAM. This should, or better: 
> must be enough to do the rsync-operation nearly without any swapping - 
> kernel 2.2.19 does it!
> 
> The performance of kernel 2.4.18pre1 is very poor, which is no surprise, 
> because the machine swaps nearly nonstop.

please try to reproduce on 2.4.17rc2aa2, thanks.

	ftp://ftp.us.kernel.org/pub/linux/kernel/people/andrea/kernels/v2.4/2.4.17rc2aa2.bz2

Andrea

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [2.4.17/18pre] VM and swap - it's really unusable
  2001-12-28 20:16 [2.4.17/18pre] VM and swap - it's really unusable Andreas Hartmann
                   ` (4 preceding siblings ...)
  2001-12-29 15:15 ` Andrea Arcangeli
@ 2002-01-03 20:23 ` Ken Brownfield
  2002-01-03 20:50   ` Rik van Riel
                     ` (3 more replies)
  5 siblings, 4 replies; 49+ messages in thread
From: Ken Brownfield @ 2002-01-03 20:23 UTC (permalink / raw)
  To: Andreas Hartmann; +Cc: Kernel-Mailingliste

Unfortunately, I lost the response that basically said "2.4 looks stable
to me", but let me count the ways in which I agree with Andreas'
sentiment:

	A) VM has major issues
		1) about a dozen recent OOPS reports in VM code
		2) VM falls down on large-memory machines with a
		   high inode count (slocate/updatedb, i/dcache)
		3) Memory allocation failures and OOM triggers
		   even though caches remain full.
		4) Other bugs fixed in -aa and others
	B) Live- and dead-locks that I'm seeing on all 2.4 production
	   machines > 2.4.9, possibly related to A.  But how will I
	   ever find out?
	C) IO-APIC code that requires noapic on any and all SMP
	   machines that I've ever run on.

I don't have anything against anyone here -- I think everyone is doing a
fine job.  It's an issue of acceptance of the problem and focus.  These
issues are all showstoppers for me, and while I don't represent the 90%
of the Linux market that is UP desktops, IMHO future work on the kernel
will be degraded by basic functionality that continues to cause
problems.

I think seeing some of Andrea's and Andrew's et al patches actually
*happen* would be a good thing, since 2.4 kernels are decidedly not
ready for production here.  I am forced to apply 26 distinct patch sets
to my kernels, and I am NOT the right person to make these judgements.
Which is why I was interested in an LKML summary source, though I
haven't yet had a chance to catch up on that thread of comment.

Having a glitch in the radeon driver is one thing; having persistent,
fatal, and reproducable failures in universal kernel code is entirely
another.

-- 
Ken.
brownfld@irridia.com

On Fri, Dec 28, 2001 at 09:16:38PM +0100, Andreas Hartmann wrote:
| Hello all,
| 
| Again, I did a rsync-operation as described in
| "[2.4.17rc1] Swapping" MID <3C1F4014.2010705@athlon.maya.org>.
| 
| This time, the kernel had a swappartition which was about 200MB. As the 
| swap-partition was fully used, the kernel killed all processes of knode.
| Nearly 50% of RAM had been used for buffers at this moment. Why is there 
| so much memory used for buffers?
| 
| I know I repeat it, but please:
| 
| 	Fix the VM-management in kernel 2.4.x. It's unusable. Believe
| 	me! As comparison: kernel 2.2.19 didn't need nearly any swap for
| 	the same operation!
| 
| Please consider that I'm using 512 MB of RAM. This should, or better: 
| must be enough to do the rsync-operation nearly without any swapping - 
| kernel 2.2.19 does it!
| 
| The performance of kernel 2.4.18pre1 is very poor, which is no surprise, 
| because the machine swaps nearly nonstop.
| 
| 
| Regards,
| Andreas Hartmann
| 
| -
| To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
| the body of a message to majordomo@vger.kernel.org
| More majordomo info at  http://vger.kernel.org/majordomo-info.html
| Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [2.4.17/18pre] VM and swap - it's really unusable
  2002-01-03 20:23 ` Ken Brownfield
@ 2002-01-03 20:50   ` Rik van Riel
  2002-01-03 21:54   ` Andrew Morton
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 49+ messages in thread
From: Rik van Riel @ 2002-01-03 20:50 UTC (permalink / raw)
  To: Ken Brownfield; +Cc: Andreas Hartmann, Kernel-Mailingliste

On Thu, 3 Jan 2002, Ken Brownfield wrote:

> 	A) VM has major issues
> 		1) about a dozen recent OOPS reports in VM code
> 		2) VM falls down on large-memory machines with a
> 		   high inode count (slocate/updatedb, i/dcache)
> 		3) Memory allocation failures and OOM triggers
> 		   even though caches remain full.
> 		4) Other bugs fixed in -aa and others
> 	B) Live- and dead-locks that I'm seeing on all 2.4 production
> 	   machines > 2.4.9, possibly related to A.  But how will I
> 	   ever find out?

I've spent ages trying to fix these bugs in the -ac kernel,
but they got all backed out in search of better performance.

Right now I'm developing a VM again, but I have no interest
at all in fixing the livelocks in the main kernel, they'll
just get removed again after a while.

If you want to test my VM stuff, you can get patches from
http://surriel.com/patches/ or direct access at the bitkeeper
tree on http://linuxvm.bkbits.net/

cheers,

Rik
-- 
Shortwave goes a long way:  irc.starchat.net  #swl

http://www.surriel.com/		http://distro.conectiva.com/


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [2.4.17/18pre] VM and swap - it's really unusable
  2002-01-03 20:23 ` Ken Brownfield
  2002-01-03 20:50   ` Rik van Riel
@ 2002-01-03 21:54   ` Andrew Morton
  2002-01-04  4:56     ` Ken Brownfield
  2002-01-04  0:19   ` Stephan von Krawczynski
  2002-01-11 20:41   ` Ken Brownfield
  3 siblings, 1 reply; 49+ messages in thread
From: Andrew Morton @ 2002-01-03 21:54 UTC (permalink / raw)
  To: Ken Brownfield; +Cc: Andreas Hartmann, Kernel-Mailingliste

Ken Brownfield wrote:
> 
> Unfortunately, I lost the response that basically said "2.4 looks stable
> to me", but let me count the ways in which I agree with Andreas'
> sentiment:
> 
>         A) VM has major issues
>                 1) about a dozen recent OOPS reports in VM code

Ben LaHaise's fix for page_cache_release() is absolutely required.

>                 2) VM falls down on large-memory machines with a
>                    high inode count (slocate/updatedb, i/dcache)
>                 3) Memory allocation failures and OOM triggers
>                    even though caches remain full.
>                 4) Other bugs fixed in -aa and others
>         B) Live- and dead-locks that I'm seeing on all 2.4 production
>            machines > 2.4.9, possibly related to A.  But how will I
>            ever find out?

Does this happen with the latest -aa patch?  If so, please send
a full system description and report.

>         C) IO-APIC code that requires noapic on any and all SMP
>            machines that I've ever run on.

Dunno about this one.  Have you prepared a description?
 

-

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [2.4.17/18pre] VM and swap - it's really unusable
  2002-01-03 20:23 ` Ken Brownfield
  2002-01-03 20:50   ` Rik van Riel
  2002-01-03 21:54   ` Andrew Morton
@ 2002-01-04  0:19   ` Stephan von Krawczynski
  2002-01-04  5:26     ` Ken Brownfield
  2002-01-04 20:15     ` Andreas Hartmann
  2002-01-11 20:41   ` Ken Brownfield
  3 siblings, 2 replies; 49+ messages in thread
From: Stephan von Krawczynski @ 2002-01-04  0:19 UTC (permalink / raw)
  To: Ken Brownfield; +Cc: Andreas Hartmann, Kernel-Mailingliste

> Unfortunately, I lost the response that basically said "2.4 looks   
stable                                                                
> to me", but let me count the ways in which I agree with Andreas'    
> sentiment:                                                          
>                                                                     
> A) VM has major issues                                              
                                                                      
On all boxes I run currently (all 1GB or below RAM), I cannot find    
_major_ issues.                                                       
                                                                      
> 2) VM falls down on large-memory machines with a                    
>    high inode count (slocate/updatedb, i/dcache)                    
                                                                      
Must be beyond the GB range.                                          
                                                                      
> 3) Memory allocation failures and OOM triggers                      
>    even though caches remain full.                                  
                                                                      
I have not had one up to now in everyday life with 2.4.17             
                                                                      
> 4) Other bugs fixed in -aa and others                               
                                                                      
Hm, well I would expect Andrea to do tuning and fixing as experience  
evolves...                                                            
                                                                      
> B) Live- and dead-locks that I'm seeing on all 2.4 production       
> 	   machines > 2.4.9, possibly related to A.  But how will I        
> 	   ever find out?                                                  
                                                                      
Me = none up to now I could track down to a kernel issue. The single  
one I had was with a distro kernel around 2.4.10 and flaky hardware.  
                                                                      
> C) IO-APIC code that requires noapic on any and all SMP             
>   machines that I've ever run on.                                   
                                                                      
I am currently running 5 Asus CUV4X-D based SMP boxes all with apic   
_on_, amongst  which are squids, sql servers, workstation type setups 
(2 my very own).                                                      
                                                                      
> I don't have anything against anyone here -- I think everyone is    
doing a                                                               
> fine job.  It's an issue of acceptance of the problem and focus.    
These                                                                 
> issues are all showstoppers for me, and while I don't represent the 
90%                                                                   
> of the Linux market that is UP desktops, IMHO future work on the    
kernel                                                                
> will be degraded by basic functionality that continues to cause     
> problems.                                                           
                                                                      
Have you run _yourself_ into a problem with 2.4.17?                   
I mean it is not perfect of course, but it is far better than you make
it look.                                                              
I could hand the brown bag to all versions below about 2.4.15  pretty 
easy, but since 2.4.16 it has really become hard to shoot it down for 
me. Ok, I use only pretty selected hardware, but there are reasons I  
do, and they are not related to the kernel in first place.            
                                                                      
Regards,                                                              
Stephan                                                               
                                                                      

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [2.4.17/18pre] VM and swap - it's really unusable
  2002-01-03 21:54   ` Andrew Morton
@ 2002-01-04  4:56     ` Ken Brownfield
  0 siblings, 0 replies; 49+ messages in thread
From: Ken Brownfield @ 2002-01-04  4:56 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel

Actually, I posted about C) many moons ago, and had some chats with
Manfred Spraul and Alan.  It's a tough one to crack, and I have my own
workaround patch (below) that I've been using for a while now.  My posts
are in the archives, but I can send a summary by request.

I haven't succeeded my bag check in putting -aa in production, which is
where I'm able to reproduce these problems.  Part of the problem is me,
in that I can't easily test with -aa.  And part of the problem is
chicken vs egg -- can't test unless it's in mainline, don't want to put
questionable stuff in a release kernel, even a -pre...  But I do think
the -aa stuff is worth breaking out into Marcelo-digestable chunks as
soon as Andrea can.

The machines that are OOPSing are in production and right now don't have
serial consoles available... that will change in a month or so, but
right now I can't decode OOPSes without hand-copying.  I might get that
desparate unless the problem goes away with 2.4.18 (with -aa merged,
hopefully.  :)

Thanks much,
-- 
Ken.
brownfld@irridia.com

Applies to any recent 2.4.  Changing indent sucks.

--- linux/arch/i386/kernel/io_apic.c.orig	Tue Nov 13 17:28:41 2001
+++ linux/arch/i386/kernel/io_apic.c	Tue Dec 18 15:10:45 2001
@@ -172,6 +172,7 @@
 int pirq_entries [MAX_PIRQS];
 int pirqs_enabled;
 int skip_ioapic_setup;
+int pintimer_setup;

 static int __init ioapic_setup(char *str)
 {
@@ -179,7 +180,14 @@
 	return 1;
 }

+static int __init do_pintimer_setup(char *str)
+{
+	pintimer_setup = 1;
+	return 1;
+}
+
 __setup("noapic", ioapic_setup);
+__setup("pintimer", do_pintimer_setup);

 static int __init ioapic_pirq_setup(char *str)
 {
@@ -1524,27 +1532,31 @@
 		printk(KERN_ERR "..MP-BIOS bug: 8254 timer not connected to IO-APIC\n");
 	}

-	printk(KERN_INFO "...trying to set up timer (IRQ0) through the 8259A ... ");
-	if (pin2 != -1) {
-		printk("\n..... (found pin %d) ...", pin2);
-		/*
-		 * legacy devices should be connected to IO APIC #0
-		 */
-		setup_ExtINT_IRQ0_pin(pin2, vector);
-		if (timer_irq_works()) {
-			printk("works.\n");
-			if (nmi_watchdog == NMI_IO_APIC) {
-				setup_nmi();
-				check_nmi_watchdog();
+	if ( pintimer_setup )
+		printk(KERN_INFO "...skipping 8259A init for IRQ0\n");
+	else {
+		printk(KERN_INFO "...trying to set up timer (IRQ0) through the 8259A ... ");
+		if (pin2 != -1) {
+			printk("\n..... (found pin %d) ...", pin2);
+			/*
+			 * legacy devices should be connected to IO APIC #0
+			 */
+			setup_ExtINT_IRQ0_pin(pin2, vector);
+			if (timer_irq_works()) {
+				printk("works.\n");
+				if (nmi_watchdog == NMI_IO_APIC) {
+					setup_nmi();
+					check_nmi_watchdog();
+				}
+				return;
 			}
-			return;
+			/*
+			 * Cleanup, just in case ...
+			 */
+			clear_IO_APIC_pin(0, pin2);
 		}
-		/*
-		 * Cleanup, just in case ...
-		 */
-		clear_IO_APIC_pin(0, pin2);
+		printk(" failed.\n");
 	}
-	printk(" failed.\n");

 	if (nmi_watchdog) {
 		printk(KERN_WARNING "timer doesnt work through the IO-APIC - disabling NMI Watchdog!\n");
On Thu, Jan 03, 2002 at 01:54:14PM -0800, Andrew Morton wrote:
| Ken Brownfield wrote:
| > 
| > Unfortunately, I lost the response that basically said "2.4 looks stable
| > to me", but let me count the ways in which I agree with Andreas'
| > sentiment:
| > 
| >         A) VM has major issues
| >                 1) about a dozen recent OOPS reports in VM code
| 
| Ben LaHaise's fix for page_cache_release() is absolutely required.
| 
| >                 2) VM falls down on large-memory machines with a
| >                    high inode count (slocate/updatedb, i/dcache)
| >                 3) Memory allocation failures and OOM triggers
| >                    even though caches remain full.
| >                 4) Other bugs fixed in -aa and others
| >         B) Live- and dead-locks that I'm seeing on all 2.4 production
| >            machines > 2.4.9, possibly related to A.  But how will I
| >            ever find out?
| 
| Does this happen with the latest -aa patch?  If so, please send
| a full system description and report.
| 
| >         C) IO-APIC code that requires noapic on any and all SMP
| >            machines that I've ever run on.
| 
| Dunno about this one.  Have you prepared a description?
|  
| 
| -

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [2.4.17/18pre] VM and swap - it's really unusable
  2002-01-04  0:19   ` Stephan von Krawczynski
@ 2002-01-04  5:26     ` Ken Brownfield
  2002-01-04  8:06       ` Ville Herva
  2002-01-04 13:03       ` Stephan von Krawczynski
  2002-01-04 20:15     ` Andreas Hartmann
  1 sibling, 2 replies; 49+ messages in thread
From: Ken Brownfield @ 2002-01-04  5:26 UTC (permalink / raw)
  To: Stephan von Krawczynski; +Cc: linux-kernel

On Fri, Jan 04, 2002 at 01:19:28AM +0100, Stephan von Krawczynski wrote:
| > A) VM has major issues                                              
|                                                                       
| On all boxes I run currently (all 1GB or below RAM), I cannot find    
| _major_ issues.                                                       

Yeah, I'm seeing it primarily with 1-4GB, though I have very few <1GB
machines in production.

| > 2) VM falls down on large-memory machines with a                    
| >    high inode count (slocate/updatedb, i/dcache)                    
|                                                                       
| Must be beyond the GB range.                                          

The critical part is the high inode count -- memory amount increases the
severity rather than triggering the problem.

| > 3) Memory allocation failures and OOM triggers                      
| >    even though caches remain full.                                  
|                                                                       
| I have not had one up to now in everyday life with 2.4.17             

I'm seeing this in malloc()-heavy apps, but fairly sporadic unless I
create a test case.  On desktops, most of these issues disappear, but I
do think the mindset behind the kernel needs to at least partially break
free of the grip of UP desktops, at least to the point of fixing issues
like I'm mentioning.

Not critical for me; but high-profile on lkml.

[...]
| > C) IO-APIC code that requires noapic on any and all SMP             
| >   machines that I've ever run on.                                   
|                                                                       
| I am currently running 5 Asus CUV4X-D based SMP boxes all with apic   
| _on_, amongst  which are squids, sql servers, workstation type setups 
| (2 my very own).                                                      

Do they have *sustained* heavy hit/IRQ/IO load?  For example, sending
25Mbit and >1,000 connections/s of sustained small images traffic
through khttpd will kill 2.4 (slow loss of timer and eventual total
freeze) in a couple of hours.  Trivially reproducable for me on SMP with
any amount of memory.  On HP, Tyan, Intel, Asus... etc.

| Have you run _yourself_ into a problem with 2.4.17?                   
| I mean it is not perfect of course, but it is far better than you make
| it look.                                                              

2.4.17 (and -pre/-rc) is my yardstick, actually.  With the exception of
-aa, I stay very close to the bleeding edge.

Please don't misunderstand -- I don't think any 2.4 kernel sucks (with
the exception of the two or three DONTUSE kernels. :)  In fact, I have
zero complaints other than the ones I've listed.  I was ecstatic when
2.2 came out, and 2.4 is just as impressive.

It's not that the kernel is bad, it's that there are specific things
that shouldn't be forgotten because of a "the kernel is good"
evaluation.  Especially those that make Linux regularly unstable in
common production environments.

| I could hand the brown bag to all versions below about 2.4.15  pretty 
| easy, but since 2.4.16 it has really become hard to shoot it down for 
| me. Ok, I use only pretty selected hardware, but there are reasons I  
| do, and they are not related to the kernel in first place.            

I use pretty selected hardware as well -- scaling hundreds of servers
for varied uses really depends on having someone track and select
hardware, and using it homogenously.  Of course, of all of the selected
hardware I've used over the last two years since 2.4.0-test1, C) has
persisted on all configurations, but the others are more recent but
equally omnipresent.

Like I said, I suspect that most people with machines in lower-load
environments don't have these issues, but "number of people effected" is
only one metric to judge the importance of an issue.

Of course, I'm not biased or anything. ;-)

Thanks for the input,
-- 
Ken.
brownfld@irridia.com

|                                                                       
| Regards,                                                              
| Stephan                                                               
|                                                                       

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [2.4.17/18pre] VM and swap - it's really unusable
  2002-01-04  5:26     ` Ken Brownfield
@ 2002-01-04  8:06       ` Ville Herva
  2002-01-04 13:05         ` Stephan von Krawczynski
  2002-01-04 13:03       ` Stephan von Krawczynski
  1 sibling, 1 reply; 49+ messages in thread
From: Ville Herva @ 2002-01-04  8:06 UTC (permalink / raw)
  To: Ken Brownfield; +Cc: Stephan von Krawczynski, linux-kernel

On Thu, Jan 03, 2002 at 11:26:01PM -0600, you [Ken Brownfield] claimed:
> 
> | > 3) Memory allocation failures and OOM triggers                      
> | >    even though caches remain full.                                  
> |                                                                       
> | I have not had one up to now in everyday life with 2.4.17             
> 
> I'm seeing this in malloc()-heavy apps, but fairly sporadic unless I
> create a test case.  

I'm seeing this on 2GB IA64 (2.4.16-17). I posted a _very_ simple test case
to lkml a while a go. It didn't happen on 256MB x86.

I plan to try -aa shortly, now that I got patches to make it compile on
IA64.


-- v --

v@iki.fi

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [2.4.17/18pre] VM and swap - it's really unusable
  2002-01-04  5:26     ` Ken Brownfield
  2002-01-04  8:06       ` Ville Herva
@ 2002-01-04 13:03       ` Stephan von Krawczynski
  2002-01-04 23:50         ` Ken Brownfield
  1 sibling, 1 reply; 49+ messages in thread
From: Stephan von Krawczynski @ 2002-01-04 13:03 UTC (permalink / raw)
  To: Ken Brownfield; +Cc: linux-kernel

On Thu, 3 Jan 2002 23:26:01 -0600
Ken Brownfield <brownfld@irridia.com> wrote:

> On Fri, Jan 04, 2002 at 01:19:28AM +0100, Stephan von Krawczynski wrote:
> | > A) VM has major issues                                              
> |                                                                       
> | On all boxes I run currently (all 1GB or below RAM), I cannot find    
> | _major_ issues.                                                       
> 
> Yeah, I'm seeing it primarily with 1-4GB, though I have very few <1GB
> machines in production.

Ok. It would be really nice to know if the -aa patches do any good at your
configs. Andrea has possibly done something on the issue. But let me take this
chance to state an open word: last time Andrea talked about his personal
hardware I couldn't really believe it - because it was so ridiculously small. I
wonder if anyone at SuSE management _does_ actually read this list and think
about how someone can do a good job without good equipment. If you really want
to do something groundbreaking about highmem you have to have a _box_. A box
_somewhere_ in the world or a patch for highmem-in-lowmem is not really the
same thing. Even Schumacher wouldn't have won formula one by sitting inside a
Fiat Uno with a patched speedometer.

> but I
> do think the mindset behind the kernel needs to at least partially break
> free of the grip of UP desktops, at least to the point of fixing issues
> like I'm mentioning.
> 
> Not critical for me; but high-profile on lkml.

You are right.

> [...]
> | > C) IO-APIC code that requires noapic on any and all SMP             
> | >   machines that I've ever run on.                                   
> |                                                                       
> | I am currently running 5 Asus CUV4X-D based SMP boxes all with apic   
> | _on_, amongst  which are squids, sql servers, workstation type setups 
> | (2 my very own).                                                      
> 
> Do they have *sustained* heavy hit/IRQ/IO load?  For example, sending
> 25Mbit and >1,000 connections/s of sustained small images traffic
> through khttpd will kill 2.4 (slow loss of timer and eventual total
> freeze) in a couple of hours.  Trivially reproducable for me on SMP with
> any amount of memory.  On HP, Tyan, Intel, Asus... etc.

Hm, I have about 24GB of NFS traffic every day, which may be too less. What
exactly are you seeing in this case (logfiles etc.)?

> It's not that the kernel is bad, it's that there are specific things
> that shouldn't be forgotten because of a "the kernel is good"
> evaluation.

Hopefully nobody does this here, I don't.

> Like I said, I suspect that most people with machines in lower-load
> environments don't have these issues, but "number of people effected" is
> only one metric to judge the importance of an issue.

The number of people is not really interesting for me, as the boxes get bigger
every day it is only a matter of time to see more people with lots of GB (as an
example).

> Of course, I'm not biased or anything. ;-)

How could you ? ;-))

Regards,
Stephan

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [2.4.17/18pre] VM and swap - it's really unusable
  2002-01-04  8:06       ` Ville Herva
@ 2002-01-04 13:05         ` Stephan von Krawczynski
  0 siblings, 0 replies; 49+ messages in thread
From: Stephan von Krawczynski @ 2002-01-04 13:05 UTC (permalink / raw)
  To: Ville Herva; +Cc: brownfld, linux-kernel

On Fri, 4 Jan 2002 10:06:05 +0200
Ville Herva <vherva@niksula.hut.fi> wrote:

> On Thu, Jan 03, 2002 at 11:26:01PM -0600, you [Ken Brownfield] claimed:
> > 
> > | > 3) Memory allocation failures and OOM triggers                      
> > | >    even though caches remain full.                                  
> > |                                                                       
> > | I have not had one up to now in everyday life with 2.4.17             
> > 
> > I'm seeing this in malloc()-heavy apps, but fairly sporadic unless I
> > create a test case.  
> 
> I'm seeing this on 2GB IA64 (2.4.16-17). I posted a _very_ simple test case
> to lkml a while a go. It didn't happen on 256MB x86.
> 
> I plan to try -aa shortly, now that I got patches to make it compile on
> IA64.

Ok, I am going to buy more mem right now to see what you see.

Regards,
Stephan



^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [2.4.17/18pre] VM and swap - it's really unusable
  2002-01-04  0:19   ` Stephan von Krawczynski
  2002-01-04  5:26     ` Ken Brownfield
@ 2002-01-04 20:15     ` Andreas Hartmann
  2002-01-04 20:55       ` Stephan von Krawczynski
  2002-01-05  9:24       ` Petro
  1 sibling, 2 replies; 49+ messages in thread
From: Andreas Hartmann @ 2002-01-04 20:15 UTC (permalink / raw)
  To: Stephan von Krawczynski; +Cc: Ken Brownfield, Kernel-Mailingliste

Stephan von Krawczynski wrote:

>>Unfortunately, I lost the response that basically said "2.4 looks   
>>stable                                                                
>>to me", but let me count the ways in which I agree with Andreas'    
>>sentiment:                                                          
>>                                                                    
>>A) VM has major issues                                              

Unfortunately you are right.

>>
>                                                                       
> On all boxes I run currently (all 1GB or below RAM), I cannot find    
> _major_ issues.                                                       

Question is: which nature is your application / load of the system? You 
wrote something about database server. How much rows alltogether? What's 
the size of the table(s)? How many concurrent accesses do you have? Do 
you do "easy" searches where all of the conditions are located in the 
index? How big is your index? How big is the throughput of your 
database? Do you have your tables on raw partitions (without caching; as 
you can do it with UDB)?

You mentioned squid, too. I'm running squid here on a AMD K6 2 400, 256 
MB RAM. It's mostly (sometimes plus my wife) for my own. No more users. 
In this situation, I can't see any problem, too. Why? There is no load, 
no throughput, ... .

How big are the partitions you are mounting at once? In my case, all the 
partitions together have about 70GB (all reiserfs).

I want to know it, because I think the problem depends on how much 
different HD-memory is accessed. If you have applications, which doesn't 
access to much memory, you can't view the problems.
If you access more than 1G (and you do not just copy, but rsync e.g.) 
and you have only 512MB of RAM, the machine swaps a lot with most actual 
2.4.-kernels (patches).

Another question:
Are there any tools to meassure the datathroughput a application causes? 
Interesting would be the sum at the end of the process, the maximum and 
average throughput (in- and output seperated) and the same for swapactivity.
It could probably help to find optimization potential. At least it would 
give the chance to directly compare the demand of different applications.

Regards,
Andreas Hartmann

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [2.4.17/18pre] VM and swap - it's really unusable
  2002-01-04 20:15     ` Andreas Hartmann
@ 2002-01-04 20:55       ` Stephan von Krawczynski
  2002-01-05  8:39         ` Andreas Hartmann
  2002-01-05  9:24       ` Petro
  1 sibling, 1 reply; 49+ messages in thread
From: Stephan von Krawczynski @ 2002-01-04 20:55 UTC (permalink / raw)
  To: Andreas Hartmann; +Cc: brownfld, linux-kernel

On Fri, 04 Jan 2002 21:15:42 +0100
Andreas Hartmann <andihartmann@freenet.de> wrote:

[I will answer not all of your questions, as this is a matter of business, too]

> > On all boxes I run currently (all 1GB or below RAM), I cannot find    
> > _major_ issues.                                                       
> 
> 
> Question is: which nature is your application / load of the system?

Generally we do not drive the boxes up to the edge. Our philosophy is to throw
money at the problem, before it actually arises. Yes, I can see the future ...
;-)

> [...] Do you have your tables on raw partitions (without caching; as 
> you can do it with UDB)?

No.

> How big are the partitions you are mounting at once? In my case, all the 
> partitions together have about 70GB (all reiserfs).

about 130 GB, all reiserfs.

> I want to know it, because I think the problem depends on how much 
> different HD-memory is accessed.

I guess you should tilt that theory.
Have you already tried to throw a big SPARC at the problem?

> If you have applications, which doesn't 
> access to much memory, you can't view the problems.
> If you access more than 1G (and you do not just copy, but rsync e.g.) 
> and you have only 512MB of RAM, the machine swaps a lot with most actual 
> 2.4.-kernels (patches).

Can you provide a simple and reproducible test case (e.g. some demo source),
where things break? I am very willing to test it here.

Regards,
Stephan



^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [2.4.17/18pre] VM and swap - it's really unusable
  2002-01-04 13:03       ` Stephan von Krawczynski
@ 2002-01-04 23:50         ` Ken Brownfield
  2002-01-05 15:08           ` Stephan von Krawczynski
  0 siblings, 1 reply; 49+ messages in thread
From: Ken Brownfield @ 2002-01-04 23:50 UTC (permalink / raw)
  To: Stephan von Krawczynski; +Cc: linux-kernel

On Fri, Jan 04, 2002 at 02:03:21PM +0100, Stephan von Krawczynski wrote:
[...]
| Ok. It would be really nice to know if the -aa patches do any good at your

I'd love to, but unfortunately my problems reproduce only in production,
and -- nothing against Andrea -- I'm hesitant to deploy -aa live, since
it hasn't received the widespread use that mainline has.  I may be
forced to soon if the VM fixes don't get merged.

[...]
| > Do they have *sustained* heavy hit/IRQ/IO load?  For example, sending
| > 25Mbit and >1,000 connections/s of sustained small images traffic
| > through khttpd will kill 2.4 (slow loss of timer and eventual total
| > freeze) in a couple of hours.  Trivially reproducable for me on SMP with
| > any amount of memory.  On HP, Tyan, Intel, Asus... etc.
| 
| Hm, I have about 24GB of NFS traffic every day, which may be too less. What
| exactly are you seeing in this case (logfiles etc.)?

Well, the nature of the problem is that the timer "slows" and stops,
causing the machine to get more and more sluggish until it falls of the
net and stops dead.

I suspect that high IRQ rates cause the issue -- large sequential
transfers are not necessarily culprits due the lowish overhead.

[...]
| > It's not that the kernel is bad, it's that there are specific things
| > that shouldn't be forgotten because of a "the kernel is good"
| > evaluation.
| 
| Hopefully nobody does this here, I don't.

I don't think it's intentional, and I realize that VM changes are hard
to swallow in a stable kernel release.  I just hope that the severity
and fairly wide negative effect is enough to make people more
comfortable with accepting VM fixes that may be somewhat invasive.

Thanks,
-- 
Ken.
brownfld@irridia.com

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [2.4.17/18pre] VM and swap - it's really unusable
  2002-01-04 20:55       ` Stephan von Krawczynski
@ 2002-01-05  8:39         ` Andreas Hartmann
  2002-01-05 12:59           ` M. Edward (Ed) Borasky
  0 siblings, 1 reply; 49+ messages in thread
From: Andreas Hartmann @ 2002-01-05  8:39 UTC (permalink / raw)
  To: Stephan von Krawczynski; +Cc: brownfld, linux-kernel

Stephan von Krawczynski wrote:

[...]

>>If you have applications, which doesn't 
>>access to much memory, you can't view the problems.
>>If you access more than 1G (and you do not just copy, but rsync e.g.) 
>>and you have only 512MB of RAM, the machine swaps a lot with most actual 
>>2.4.-kernels (patches).
>>
> 
> Can you provide a simple and reproducible test case (e.g. some demo source),
> where things break? I am very willing to test it here.
> 

It's easy - take a grown inn-newsserver-partition with reiserfs (*) (a 
lot of small files and a lot of directories), about 1,3 GB or more, and 
do a complete rsync to this partition to transport it somewhere else. 
But you have to do it with a existing target, no empty target, so that 
rsync must scan the whole target partition, too.

I don't like special test-programs. They seldom show up the reality. 
What we need is a kernel that behaves fine in reality - not in testcases.
And before starting the test, take care, that most of ram is already 
used for cache or buffers or applications.

I did this test with several VM-patches and there are huge differences 
in swap consumption between them: 319MB with 2.4.17rc2 and 59MB with 
2.4.17 oom-patch (max).
It's more than a little difference :-).

Regards,
Andreas Hartmann

(*) If I had DSL, I would send it to you (as tar.gz) - but with modem, 
it's a bit too much :-)!
But your squid cache should be fine, too. It has a similar structure: a 
lot of small files and a lot of subdirectories. But I think, that your 
squid cache size isn't as high as my inn-partition.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [2.4.17/18pre] VM and swap - it's really unusable
  2002-01-04 20:15     ` Andreas Hartmann
  2002-01-04 20:55       ` Stephan von Krawczynski
@ 2002-01-05  9:24       ` Petro
  2002-01-05 15:44         ` Stephan von Krawczynski
  1 sibling, 1 reply; 49+ messages in thread
From: Petro @ 2002-01-05  9:24 UTC (permalink / raw)
  To: Andreas Hartmann; +Cc: Kernel-Mailingliste

"We" (Auctionwatch.com) are experiencing problems that appear to be
related to VM, I realize that this question was not directed at me:

On Fri, Jan 04, 2002 at 09:15:42PM +0100, Andreas Hartmann wrote:
> Stephan von Krawczynski wrote:
> Question is: which nature is your application / load of the system? You 
> wrote something about database server. How much rows alltogether? What's 

    Mysql running a dual 650 PIII, 2 gig ram. Rows? Dunno, but several
    million tables (about 85 gig of tables averaging 45-50k IIRC). 

> the size of the table(s)? How many concurrent accesses do you have? Do

    We will have 2-400+ tables open at once. 

> you do "easy" searches where all of the conditions are located in the 
> index? How big is your index? How big is the throughput of your 
> database? Do you have your tables on raw partitions (without caching; as 
> you can do it with UDB)?

    I don't know much about the specific design, other than I've been
    told it's non-optimal. 

> How big are the partitions you are mounting at once? In my case, all the 
> partitions together have about 70GB (all reiserfs).

    One 250G logical volume, a couple smaller ones (3 gig, 30 gig). 

-- 
Share and Enjoy. 

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [2.4.17/18pre] VM and swap - it's really unusable
  2002-01-05  8:39         ` Andreas Hartmann
@ 2002-01-05 12:59           ` M. Edward (Ed) Borasky
  2002-01-05 15:09             ` Andreas Hartmann
  2002-01-06 15:51             ` vda
  0 siblings, 2 replies; 49+ messages in thread
From: M. Edward (Ed) Borasky @ 2002-01-05 12:59 UTC (permalink / raw)
  To: Andreas Hartmann; +Cc: Stephan von Krawczynski, brownfld, linux-kernel

On Sat, 5 Jan 2002, Andreas Hartmann wrote:

> I don't like special test-programs. They seldom show up the reality.
> What we need is a kernel that behaves fine in reality - not in
> testcases.  And before starting the test, take care, that most of ram
> is already used for cache or buffers or applications.

OK, here's some pseduo-code for a real-world test case. I haven't had a
chance to code it up, but I'm guessing I know what it's going to do. I'd
*love* to be proved wrong :).

# build and boot a kernel with "Magic SysRq" turned on
# echo  1 > /proc/sys/kernel/sysrq
# fire up "nice --19 top" as "root"
# read "MemTotal" from /proc/meminfo

# now start the next two jobs concurrently

# write a disk file with "MemTotal" data or more in it

# perform a 2D in-place FFT of total size at least "MemTotal/2" but less
# than "MemTotal"

Watch the "top" window like a hawk. "Cached" will grow because of the
disk write and "free" will drop because the page cache is growing and
the 2D FFT is using *its* memory. Eventually the two will start
competing for the last bits of free memory. "kswapd" and "kupdated" will
start working furiously, bringing the system CPU utilization to 99+
percent.  At this point the system will appear highly unresponsive.

Even with the "nice --19" setting, "top" is going to have a hard time
keeping its five-second screen updates going. You will quite possibly
end up going to the console and doing alt-sysrq-m, which dumps the
memory status on the console and into /var/log/messages. Then if you do
alt-sysrq-i, which kills everything but "init", you should be able to
log on again.

I'm going to try this on my 512 MB machine just to see what happens, but
I'd like to see what someone with a larger machine, say 4 GB, gets when
they do this. I think attempting to write a large file and do a 2D FFT
concurrently is a perfectly reasonable thing to expect an image
processing system to do in the real world. A "traditional" UNIX would do
the I/O of the file write and the compute/memory processing of the FFT
together with little or no problem. But because the 2.4 kernel insists
on keeping all those buffers around, the 2D FFT is going to have
difficulty, because it has to have its data in core.

What's worse is if the page cache gets so big that the FFT has to start
swapping. For those who aren't familiar with 2D FFTs, they take two
passes over the data. The first pass will be unit strides -- sequential
addresses. But the second pass will be large strides -- a power of two.
That second pass is going to be brutal if every page it hits has to be
swapped in!

The solution is to limit page cache size to, say, 1/4 of "MemTotal",
which I'm guessing will have a *negligible* impact on the performance of
the file write. I used to work in an image processing lab, which is
where I learned this little trick for bringing a VM to its knees, and
which is probably where the designers of other UNIX systems learned that
the memory used for buffering I/O needs to be limited :). There's
probably a VAX or two out there still that shudders when it remembers
what I did to it. :))

-- 
M. Edward Borasky

znmeb@borasky-research.net
http://www.borasky-research.net

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [2.4.17/18pre] VM and swap - it's really unusable
  2002-01-04 23:50         ` Ken Brownfield
@ 2002-01-05 15:08           ` Stephan von Krawczynski
  2002-01-05 21:40             ` Ken Brownfield
                               ` (2 more replies)
  0 siblings, 3 replies; 49+ messages in thread
From: Stephan von Krawczynski @ 2002-01-05 15:08 UTC (permalink / raw)
  To: Ken Brownfield; +Cc: linux-kernel

On Fri, 4 Jan 2002 17:50:50 -0600
Ken Brownfield <brownfld@irridia.com> wrote:

> On Fri, Jan 04, 2002 at 02:03:21PM +0100, Stephan von Krawczynski wrote:
> [...]
> | Ok. It would be really nice to know if the -aa patches do any good at your
> 
> I'd love to, but unfortunately my problems reproduce only in production,
> and -- nothing against Andrea -- I'm hesitant to deploy -aa live, since
> it hasn't received the widespread use that mainline has.  I may be
> forced to soon if the VM fixes don't get merged.

I am pretty impressed by Martins test case where merely all VM patches fail
with the exception of his own :-) The thing is, this test is not of nature
"very special" but more like "system driven to limit by normal processes". And
this is the real interesting part about it.

> | Hm, I have about 24GB of NFS traffic every day, which may be too less. What
> | exactly are you seeing in this case (logfiles etc.)?
> 
> Well, the nature of the problem is that the timer "slows" and stops,
> causing the machine to get more and more sluggish until it falls of the
> net and stops dead.
> 
> I suspect that high IRQ rates cause the issue -- large sequential
> transfers are not necessarily culprits due the lowish overhead.

What exactly do you mean with "high IRQ rate"? Can you show so numbers from
/proc/interrupts and uptime for clarification?

> | Hopefully nobody does this here, I don't.
> 
> I don't think it's intentional, and I realize that VM changes are hard
> to swallow in a stable kernel release.  I just hope that the severity
> and fairly wide negative effect is enough to make people more
> comfortable with accepting VM fixes that may be somewhat invasive.

Hm, I don't think real "big" patches are needed, Rik is according to Martins
test no gain currently as rmap flops in this test, too.

The problem is: you should really use one of your problem machines for at least
very simple testing. If you don't you possibly cannot expect your problem to be
solved soon. We would need input from your side. If I were you, I'd start of
with Martins patch. It is simple (very simple indeed), small and pinned to a
single procedure. Martins test shows - under "normal" high load (not especially
IRQ) - good result and no difference in standard load, I cannot see a risk for
oops or deadlock.

Regards,
Stephan

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [2.4.17/18pre] VM and swap - it's really unusable
  2002-01-05 12:59           ` M. Edward (Ed) Borasky
@ 2002-01-05 15:09             ` Andreas Hartmann
  2002-01-05 17:51               ` M. Edward (Ed) Borasky
  2002-01-06 15:51             ` vda
  1 sibling, 1 reply; 49+ messages in thread
From: Andreas Hartmann @ 2002-01-05 15:09 UTC (permalink / raw)
  To: M. Edward (Ed) Borasky; +Cc: Stephan von Krawczynski, brownfld, linux-kernel

M. Edward (Ed) Borasky wrote:

> On Sat, 5 Jan 2002, Andreas Hartmann wrote:
> 
> 
>>I don't like special test-programs. They seldom show up the reality.
>>What we need is a kernel that behaves fine in reality - not in
>>testcases.  And before starting the test, take care, that most of ram
>>is already used for cache or buffers or applications.
>>
> 
> OK, here's some pseduo-code for a real-world test case. I haven't had a
> chance to code it up, but I'm guessing I know what it's going to do. I'd
> *love* to be proved wrong :).


I would like to try it with the oom-patch, which needed less swap in my 
tests. It could be a good test to verify the results of the rsync-test.


> # build and boot a kernel with "Magic SysRq" turned on
> # echo  1 > /proc/sys/kernel/sysrq
> # fire up "nice --19 top" as "root"
> # read "MemTotal" from /proc/meminfo
> 
> # now start the next two jobs concurrently
> 
> # write a disk file with "MemTotal" data or more in it
> 
> # perform a 2D in-place FFT of total size at least "MemTotal/2" but less
> # than "MemTotal"
> 
> Watch the "top" window like a hawk. "Cached" will grow because of the
> disk write and "free" will drop because the page cache is growing and
> the 2D FFT is using *its* memory.


Could you please tell me a programm, that does 2D FFT? I would like to 
do this test, too!

Regards,
Andreas Hartmann


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [2.4.17/18pre] VM and swap - it's really unusable
  2002-01-05  9:24       ` Petro
@ 2002-01-05 15:44         ` Stephan von Krawczynski
  2002-01-07  7:15           ` Petro
  0 siblings, 1 reply; 49+ messages in thread
From: Stephan von Krawczynski @ 2002-01-05 15:44 UTC (permalink / raw)
  To: Petro; +Cc: andihartmann, linux-kernel

On Sat, 5 Jan 2002 01:24:42 -0800
Petro <petro@auctionwatch.com> wrote:

> "We" (Auctionwatch.com) are experiencing problems that appear to be
> related to VM, I realize that this question was not directed at me:

And how exactly do the problems look like?

Regards,
Stephan


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [2.4.17/18pre] VM and swap - it's really unusable
  2002-01-05 15:09             ` Andreas Hartmann
@ 2002-01-05 17:51               ` M. Edward (Ed) Borasky
  0 siblings, 0 replies; 49+ messages in thread
From: M. Edward (Ed) Borasky @ 2002-01-05 17:51 UTC (permalink / raw)
  To: Andreas Hartmann; +Cc: Stephan von Krawczynski, brownfld, linux-kernel

On Sat, 5 Jan 2002, Andreas Hartmann wrote:

> Could you please tell me a programm, that does 2D FFT? I would like to
> do this test, too!

Try http://www.fftw.org. This is a public domain (GPL I think) general
purpose FFT library. If I get a chance I'll download it this weekend and
figure out how to code a 2D FFT.

--
M. Edward Borasky

znmeb@borasky-research.net
http://www.borasky-research.net

Never play leapfrog with a unicorn.


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [2.4.17/18pre] VM and swap - it's really unusable
  2002-01-05 15:08           ` Stephan von Krawczynski
@ 2002-01-05 21:40             ` Ken Brownfield
  2002-01-06 15:48               ` Stephan von Krawczynski
  2002-01-07  1:42             ` Rik van Riel
  2002-01-08 15:19             ` Update " Ken Brownfield
  2 siblings, 1 reply; 49+ messages in thread
From: Ken Brownfield @ 2002-01-05 21:40 UTC (permalink / raw)
  To: Stephan von Krawczynski; +Cc: linux-kernel

On Sat, Jan 05, 2002 at 04:08:33PM +0100, Stephan von Krawczynski wrote:
| I am pretty impressed by Martins test case where merely all VM patches fail
| with the exception of his own :-) The thing is, this test is not of nature
| "very special" but more like "system driven to limit by normal processes". And
| this is the real interesting part about it.

One problem is that I've never heard of it and don't know where to get
it. ;)

| What exactly do you mean with "high IRQ rate"? Can you show so numbers from
| /proc/interrupts and uptime for clarification?

I did, back in the archives.  I don't have easy access to archives etc,
right now, but I might repost since it's been a while.

| The problem is: you should really use one of your problem machines for at least
| very simple testing. If you don't you possibly cannot expect your problem to be
| solved soon. We would need input from your side. If I were you, I'd start of
| with Martins patch. It is simple (very simple indeed), small and pinned to a
| single procedure. Martins test shows - under "normal" high load (not especially
| IRQ) - good result and no difference in standard load, I cannot see a risk for
| oops or deadlock.

Well, reboots are the problem over possible oopses (or data corruption,
even more fun.)  But on your recommendation I'll give Martin's mod a
try, given a URL.  Does Martin's patch play well with -aa?  How about
Martin+10_vm in -pre2? ;-)

At any rate, right now there are three or four people with different VM
patch sets, probably more.  There is a certain amount of work this group
can do in judging which concepts are cleaner or most suitable to 2.4.x.
It would be cool to give rmap a try, but I don't want to maintain a
2.4.x kernel with speculative features that aren't intented for 2.4.x.

I can see using patches back-ported from 2.5, but I'm a firm believer
that 2.4 should stay stable and that the benefit of 2.4 to admins is the
control by the maintainer and stability -- not the VM of the month.

I can test, but it's slow going with so many patches.  And many of the
patches haven't been properly merged with any kernel (e.g., -aa 10_vm
reverting previously applied 2.4 changes, etc.)

While I've reproduced the issues and explained them here in the past,
it's difficult for me to iterate fast enough in an environment that
easily reproduces tha problem.  I'm iterating as fast as I can, but when
I do iterate I'd prefer some support from the maintainers or other parts
of the community that "Yes, this patch has a good chance of fixing the
specific problems we've been seeing, give it a try."  Right now that
doesn't exist (with the exception of your recommendation of this Martin
patch), and that's one reason I'm hesitant to iterate too much and
effect a lot of people.

Thanks,
-- 
Ken.
brownfld@irridia.com

| 
| Regards,
| Stephan
| 

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [2.4.17/18pre] VM and swap - it's really unusable
  2002-01-05 21:40             ` Ken Brownfield
@ 2002-01-06 15:48               ` Stephan von Krawczynski
  2002-01-08  5:09                 ` Ken Brownfield
  0 siblings, 1 reply; 49+ messages in thread
From: Stephan von Krawczynski @ 2002-01-06 15:48 UTC (permalink / raw)
  To: Ken Brownfield; +Cc: linux-kernel

On Sat, 5 Jan 2002 15:40:53 -0600
Ken Brownfield <brownfld@irridia.com> wrote:

> One problem is that I've never heard of it and don't know where to get
> it. ;)

[Sent in off-LKML mail]

> | What exactly do you mean with "high IRQ rate"? Can you show so numbers from
> | /proc/interrupts and uptime for clarification?
> 
> I did, back in the archives.  I don't have easy access to archives etc,
> right now, but I might repost since it's been a while.

I read all your LKML mails since beginning of November, could find a lot about
cpu, configs,tops etc but not a single "cat /proc/interrupts" together with
uptime.

> Well, reboots are the problem over possible oopses (or data corruption,
> even more fun.)  But on your recommendation I'll give Martin's mod a
> try, given a URL.  Does Martin's patch play well with -aa?  How about
> Martin+10_vm in -pre2? ;-)

According to the ongoings of your mails you seem to try really a lot of things
to make it work out. I recommend not to intermix the patches a lot. I would
stay close to marcelo's tree and try _single_ small patches on top of that. If
you mix them up (even only two of them) you won't be able to track down very
well, what is really better or worse.

One thing I would like to ask here is this (as you are dealing with oracle
stuff): why does oracle recommend to compile the kernel in 486 mode? I talked
to someone who uses oracle on 2.4.x and he told me it is even in the latest
docs. What is the voodoo behind that? Btw he has no freezes or the like, but
occasional coredumps from oracle processes, which he states as "not nice, but
no showstopper" as his clients reconnect/retransmit with only a slight delay.
This may be related to VM, thats why I will try to convince him of some patches
:-) and have a look at the coredump-frequency.

Regards,
Stephan

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [2.4.17/18pre] VM and swap - it's really unusable
  2002-01-05 12:59           ` M. Edward (Ed) Borasky
  2002-01-05 15:09             ` Andreas Hartmann
@ 2002-01-06 15:51             ` vda
  2002-01-06 19:16               ` M. Edward (Ed) Borasky
  1 sibling, 1 reply; 49+ messages in thread
From: vda @ 2002-01-06 15:51 UTC (permalink / raw)
  To: M. Edward (Ed) Borasky; +Cc: linux-kernel

On 5 January 2002 10:59, M. Edward (Ed) Borasky wrote:

> OK, here's some pseduo-code for a real-world test case. I haven't had a
> chance to code it up, but I'm guessing I know what it's going to do. I'd
> *love* to be proved wrong :).
>
> # build and boot a kernel with "Magic SysRq" turned on
> # echo  1 > /proc/sys/kernel/sysrq
> # fire up "nice --19 top" as "root"
> # read "MemTotal" from /proc/meminfo
>
> # now start the next two jobs concurrently
>
> # write a disk file with "MemTotal" data or more in it

Like dd if=/dev/zero of=/tmp/file bs=... count=... ?

> # perform a 2D in-place FFT of total size at least "MemTotal/2" but less
> # than "MemTotal"

I'm willing to try. What program can I use for FFT?

> What's worse is if the page cache gets so big that the FFT has to start
> swapping. For those who aren't familiar with 2D FFTs, they take two
> passes over the data. The first pass will be unit strides -- sequential
> addresses. But the second pass will be large strides -- a power of two.
> That second pass is going to be brutal if every page it hits has to be
> swapped in!

Can you describe FFT memory access pattern in more detail?
I'd like to write a simple testcase with similar 'bad' pattern.
--
vda

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [2.4.17/18pre] VM and swap - it's really unusable
  2002-01-06 15:51             ` vda
@ 2002-01-06 19:16               ` M. Edward (Ed) Borasky
  2002-01-06 19:38                 ` Alan Cox
  0 siblings, 1 reply; 49+ messages in thread
From: M. Edward (Ed) Borasky @ 2002-01-06 19:16 UTC (permalink / raw)
  To: vda@port.imtp.ilyichevsk.odessa.ua; +Cc: linux-kernel

On Sun, 6 Jan 2002, vda@port.imtp.ilyichevsk.odessa.ua wrote:

> Like dd if=/dev/zero of=/tmp/file bs=... count=... ?
>
That would do it, but I was trying to give a real-world example from
image processing, like copying a large image file.

> > # perform a 2D in-place FFT of total size at least "MemTotal/2" but less
> > # than "MemTotal"
>
> I'm willing to try. What program can I use for FFT?

I use FFTW from http://www.fftw.org.

> Can you describe FFT memory access pattern in more detail?
> I'd like to write a simple testcase with similar 'bad' pattern.

Imagine a 16384 by 16384 array of double complex values. That's a 4
GByte image. Scale down to fit your machine, of course :). The first
pass will do an FFT on every row (column) if your language is C
(FORTRAN). The "stride" is 16 bytes (one complex value) in the inner
loop. Each row (column) is 16384*16 = 262144 bytes long, which works out
to 64 pages if the page size is 4096 bytes.

Then the second pass will do an FFT on every column (row). The stride is
16384*16 = 262144 bytes. This is a new page for each 16-byte complex
value you process :-). That is, all 16384 pages have to be in memory, or
swapped into memory if you've run out of real memory and the kernel has
swapped them out.

Please ... *don't* try to do this on a 512 MB machine and think that an
efficient VM is gonna make it work :),
--
M. Edward Borasky

znmeb@borasky-research.net
http://www.borasky-research.net

What phrase will you *never* hear Miss Piggy use?
"You can't make a silk purse out of a sow's ear!"

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [2.4.17/18pre] VM and swap - it's really unusable
  2002-01-06 19:16               ` M. Edward (Ed) Borasky
@ 2002-01-06 19:38                 ` Alan Cox
  2002-01-07  0:47                   ` M. Edward Borasky
  0 siblings, 1 reply; 49+ messages in thread
From: Alan Cox @ 2002-01-06 19:38 UTC (permalink / raw)
  To: "M. Edward (Ed) Borasky"
  Cc: vda@port.imtp.ilyichevsk.odessa.ua, linux-kernel

> >
> That would do it, but I was trying to give a real-world example from
> image processing, like copying a large image file.

Image processing people use tiling. Try loading a giant image into
the gimp and into a non smart application like xpaint. The difference is
huge just by careful implementation of the algorithms

> Then the second pass will do an FFT on every column (row). The stride is
> 16384*16 = 262144 bytes. This is a new page for each 16-byte complex
> value you process :-). That is, all 16384 pages have to be in memory, or
> swapped into memory if you've run out of real memory and the kernel has
> swapped them out.

Yes but you don't do it that way, you do stripes of parallel fft
computations. We can all write dumb programs that don't behave well with the
VM layer.

Alan

^ permalink raw reply	[flat|nested] 49+ messages in thread

* RE: [2.4.17/18pre] VM and swap - it's really unusable
  2002-01-06 19:38                 ` Alan Cox
@ 2002-01-07  0:47                   ` M. Edward Borasky
  0 siblings, 0 replies; 49+ messages in thread
From: M. Edward Borasky @ 2002-01-07  0:47 UTC (permalink / raw)
  To: linux-kernel

You're right ... no one does an *out-of-core* 2D FFT using VM. What I am
saying is that a large page cache can turn an *in-core* 2D FFT -- a 4 GB
case on an 8 GB machine, for example -- into an out-of-core one!

One other data point: on my stock Red Hat 7.2 box with 512 MB of RAM, I ran
a Perl script that builds a 512 MByte hash, a second Perl script which
creates a 512 MByte disk file, and the check pass of FFTW concurrently. As I
expected, the two Perl scripts competed for RAM and slowed down FFTW. What
was even more interesting, though, was that the VM apparently functions
correctly in this instance. All three of the processes were getting CPU
cycles. And I never saw "kswapd" or "kupdated" take over the system.

Although the page cache did get large at one point, once the hash builder
got to about 400 MBytes in size, the "cached" piece shrunk to about 10
MBytes and most of the RAM got allocated to the hash builder, as did
appropriate amounts of swap. In short, the kernel in Red Hat 7.2 with under
1 GByte of memory is behaving well under memory pressure. It looks like it's
kernels beyond that one that have the problems, and also systems with more
than 1 GByte. If I had the money, I'd stuff some more RAM in the machine and
see if I could isolate this a little further. If anyone wants my Perl
scripts, which are trivial, let me know.
--
M. Edward Borasky
znmeb@borasky-research.net
http://www.borasky-research.net

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [2.4.17/18pre] VM and swap - it's really unusable
  2002-01-05 15:08           ` Stephan von Krawczynski
  2002-01-05 21:40             ` Ken Brownfield
@ 2002-01-07  1:42             ` Rik van Riel
  2002-01-07  2:22               ` Rik van Riel
  2002-01-08 15:19             ` Update " Ken Brownfield
  2 siblings, 1 reply; 49+ messages in thread
From: Rik van Riel @ 2002-01-07  1:42 UTC (permalink / raw)
  To: Stephan von Krawczynski; +Cc: Ken Brownfield, linux-kernel

On Sat, 5 Jan 2002, Stephan von Krawczynski wrote:

> I am pretty impressed by Martins test case where merely all VM patches
> fail with the exception of his own :-)

No big wonder if both -aa and -rmap only get tested without swap ;)

Rik
-- 
Shortwave goes a long way:  irc.starchat.net  #swl

http://www.surriel.com/		http://distro.conectiva.com/


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [2.4.17/18pre] VM and swap - it's really unusable
  2002-01-07  1:42             ` Rik van Riel
@ 2002-01-07  2:22               ` Rik van Riel
  2002-01-07 14:20                 ` Stephan von Krawczynski
  0 siblings, 1 reply; 49+ messages in thread
From: Rik van Riel @ 2002-01-07  2:22 UTC (permalink / raw)
  To: Stephan von Krawczynski; +Cc: Ken Brownfield, linux-kernel

On Sun, 6 Jan 2002, Rik van Riel wrote:
> On Sat, 5 Jan 2002, Stephan von Krawczynski wrote:
>
> > I am pretty impressed by Martins test case where merely all VM patches
> > fail with the exception of his own :-)
>
> No big wonder if both -aa and -rmap only get tested without swap ;)

To be clear ... -aa and -rmap should of course also work
nicely without swap, no excuses for the bad behaviour
shown in Martin's test, but at the moment they simply
don't seem tuned for it.

regards,

Rik
-- 
Shortwave goes a long way:  irc.starchat.net  #swl

http://www.surriel.com/		http://distro.conectiva.com/


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [2.4.17/18pre] VM and swap - it's really unusable
  2002-01-05 15:44         ` Stephan von Krawczynski
@ 2002-01-07  7:15           ` Petro
  2002-01-07 14:33             ` Stephan von Krawczynski
  0 siblings, 1 reply; 49+ messages in thread
From: Petro @ 2002-01-07  7:15 UTC (permalink / raw)
  To: Stephan von Krawczynski; +Cc: andihartmann, linux-kernel

On Sat, Jan 05, 2002 at 04:44:05PM +0100, Stephan von Krawczynski wrote:
> On Sat, 5 Jan 2002 01:24:42 -0800
> Petro <petro@auctionwatch.com> wrote:
> 
> > "We" (Auctionwatch.com) are experiencing problems that appear to be
> > related to VM, I realize that this question was not directed at me:
> 
> And how exactly do the problems look like?

    After some time, ranging from 1 to 48 hours, mysql quits in an
    unclean fashion (dies leaving tables improperly closed) with a dump
    in the mysql log file that looks like: 

> Here is the stack dump:
> 0x807b75f handle_segfault__Fi + 383
> 0x812bcaa pthread_sighandler + 154
> 0x815059c chunk_free + 596
> 0x8152573 free + 155
> 0x811579c my_no_flags_free + 16
> 0x80764d5 _._5ilink + 61
> 0x807b48d end_thread__FP3THDb + 53
> 0x80809cc handle_one_connection__FPv + 996

    Which the Mysql support team says appears to be memory corruption.
    Since this has happened on 4 different machines, and one of them had
    memtest86 run on it (coming up clean), they seem (witness Sasha's
    post) to think this may have something to do with the memory
    handling in the kernel. 

    I haven't run it on a kernel that has debugging enabled yet,
    partially because I've been tracing a completely unrelated problems
    with our hard drives (IBM GXP 75G drives made in Hungary during the
    first 3 months of 2001), and partially because the only way to get
    this to happen is to put the database in production, which results
    in a crash, which takes our site offline, which costs us money and
    pisses off our users. Right now we're running on a sun e4500, and
    it's stable, so until we get the other problem worked out, we're
    waiting to see on this one. 


-- 
Share and Enjoy. 

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [2.4.17/18pre] VM and swap - it's really unusable
  2002-01-07  2:22               ` Rik van Riel
@ 2002-01-07 14:20                 ` Stephan von Krawczynski
  2002-01-08  0:36                   ` Rik van Riel
  0 siblings, 1 reply; 49+ messages in thread
From: Stephan von Krawczynski @ 2002-01-07 14:20 UTC (permalink / raw)
  To: Rik van Riel; +Cc: brownfld, linux-kernel

On Mon, 7 Jan 2002 00:22:09 -0200 (BRST)
Rik van Riel <riel@conectiva.com.br> wrote:

> On Sun, 6 Jan 2002, Rik van Riel wrote:
> > On Sat, 5 Jan 2002, Stephan von Krawczynski wrote:
> >
> > > I am pretty impressed by Martins test case where merely all VM patches
> > > fail with the exception of his own :-)
> >
> > No big wonder if both -aa and -rmap only get tested without swap ;)
> 
> To be clear ... -aa and -rmap should of course also work
> nicely without swap, no excuses for the bad behaviour
> shown in Martin's test, but at the moment they simply
> don't seem tuned for it.

Good to hear we agree it _should_ work. When does it (rmap)? 
;-)

Regards,
Stephan



^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [2.4.17/18pre] VM and swap - it's really unusable
  2002-01-07  7:15           ` Petro
@ 2002-01-07 14:33             ` Stephan von Krawczynski
  2002-01-07 20:29               ` Petro
  0 siblings, 1 reply; 49+ messages in thread
From: Stephan von Krawczynski @ 2002-01-07 14:33 UTC (permalink / raw)
  To: Petro; +Cc: andihartmann, linux-kernel

On Sun, 6 Jan 2002 23:15:31 -0800
Petro <petro@auctionwatch.com> wrote:

> On Sat, Jan 05, 2002 at 04:44:05PM +0100, Stephan von Krawczynski wrote:
> > On Sat, 5 Jan 2002 01:24:42 -0800
> > Petro <petro@auctionwatch.com> wrote:
> > 
> > > "We" (Auctionwatch.com) are experiencing problems that appear to be
> > > related to VM, I realize that this question was not directed at me:
> > 
> > And how exactly do the problems look like?
> 
>     After some time, ranging from 1 to 48 hours, mysql quits in an
>     unclean fashion (dies leaving tables improperly closed) with a dump
>     in the mysql log file that looks like: 

mysql question: is this a binary from some distro or self-compiled? If
self-compiled can you show your ./configure paras, please?

>     Which the Mysql support team says appears to be memory corruption.
>     Since this has happened on 4 different machines, and one of them had
>     memtest86 run on it (coming up clean), they seem (witness Sasha's
>     post) to think this may have something to do with the memory
>     handling in the kernel. 

There is a big difference between memory _corruption_ and a VM deficiency. No
app can cope with a _corruption_ and is perfectly allowed to core dump or exit
(or trash your disk). But this should not happen on allocation failures.

Unless all your RAM is from the same series I do not really believe in mem
corruption. I would try Martins small VM patch, as it looks like being a bit
more efficient in low mem conditions and this may well be the case you are
running into. This means 2.4.17 standard + patch.

Regards,
Stephan

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [2.4.17/18pre] VM and swap - it's really unusable
  2002-01-07 14:33             ` Stephan von Krawczynski
@ 2002-01-07 20:29               ` Petro
  2002-01-08  1:43                 ` Stephan von Krawczynski
  0 siblings, 1 reply; 49+ messages in thread
From: Petro @ 2002-01-07 20:29 UTC (permalink / raw)
  To: Stephan von Krawczynski; +Cc: andihartmann, linux-kernel

On Mon, Jan 07, 2002 at 03:33:48PM +0100, Stephan von Krawczynski wrote:
> On Sun, 6 Jan 2002 23:15:31 -0800
> Petro <petro@auctionwatch.com> wrote:
> > On Sat, Jan 05, 2002 at 04:44:05PM +0100, Stephan von Krawczynski wrote:
> > > On Sat, 5 Jan 2002 01:24:42 -0800
> > > Petro <petro@auctionwatch.com> wrote:
> > > > "We" (Auctionwatch.com) are experiencing problems that appear to be
> > > > related to VM, I realize that this question was not directed at me:
> > > And how exactly do the problems look like?
> >     After some time, ranging from 1 to 48 hours, mysql quits in an
> >     unclean fashion (dies leaving tables improperly closed) with a dump
> >     in the mysql log file that looks like: 
> mysql question: is this a binary from some distro or self-compiled? If
> self-compiled can you show your ./configure paras, please?

    It's the binary from mysql.com. 
 
> >     Which the Mysql support team says appears to be memory corruption.
> >     Since this has happened on 4 different machines, and one of them had
> >     memtest86 run on it (coming up clean), they seem (witness Sasha's
> >     post) to think this may have something to do with the memory
> >     handling in the kernel. 
> There is a big difference between memory _corruption_ and a VM deficiency. No
> app can cope with a _corruption_ and is perfectly allowed to core dump or exit
> (or trash your disk). But this should not happen on allocation failures.
> Unless all your RAM is from the same series I do not really believe in mem
> corruption. I would try Martins small VM patch, as it looks like being a bit
> more efficient in low mem conditions and this may well be the case you are
> running into. This means 2.4.17 standard + patch.

     Is there a reasonable chance that martins patch will get mainlined
     in the near future? One of the big reasons I chose to upgrade to a
     later kernel version (from 2.4.8ac<something>+LVMpatches+...) was
     to get away from having to apply patches (and document which
     patches and where to get them etc). 

     If this is the route I have to go, I'll do it but, well, I'm not
     that comfortable with it. 
    
-- 
Share and Enjoy. 

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [2.4.17/18pre] VM and swap - it's really unusable
  2002-01-07 14:20                 ` Stephan von Krawczynski
@ 2002-01-08  0:36                   ` Rik van Riel
  0 siblings, 0 replies; 49+ messages in thread
From: Rik van Riel @ 2002-01-08  0:36 UTC (permalink / raw)
  To: Stephan von Krawczynski; +Cc: brownfld, linux-kernel

On Mon, 7 Jan 2002, Stephan von Krawczynski wrote:

> > To be clear ... -aa and -rmap should of course also work
> > nicely without swap, no excuses for the bad behaviour
> > shown in Martin's test, but at the moment they simply
> > don't seem tuned for it.
>
> Good to hear we agree it _should_ work. When does it (rmap)?
> ;-)

I integrated Ed Tomlinson's patch today and have made
one more small change. In the patches I ran here things
worked fine, the system avoids OOM now.

Problem is, it doesn't seem to want to run the OOM
killer when needed, at least not any time soon. I need
to check out this code again later.

Anyway, rmap-11 should work fine for your test. ;)

regards,

Rik
-- 
Shortwave goes a long way:  irc.starchat.net  #swl

http://www.surriel.com/		http://distro.conectiva.com/

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [2.4.17/18pre] VM and swap - it's really unusable
  2002-01-07 20:29               ` Petro
@ 2002-01-08  1:43                 ` Stephan von Krawczynski
  2002-01-08  3:10                   ` Petro
  0 siblings, 1 reply; 49+ messages in thread
From: Stephan von Krawczynski @ 2002-01-08  1:43 UTC (permalink / raw)
  To: Petro; +Cc: andihartmann, linux-kernel

> On Mon, Jan 07, 2002 at 03:33:48PM +0100, Stephan von Krawczynski   
wrote:                                                                
> > mysql question: is this a binary from some distro or              
self-compiled? If                                                     
> > self-compiled can you show your ./configure paras, please?        
>                                                                     
>     It's the binary from mysql.com.                                 
                                                                      
Beta or stable release?                                               
                                                                      
> > [...] I would try Martins small VM patch, as it looks like being a
bit                                                                   
> > more efficient in low mem conditions and this may well be the case
you are                                                               
> > running into. This means 2.4.17 standard + patch.                 
>                                                                     
>      Is there a reasonable chance that martins patch will get       
mainlined                                                             
>      in the near future?                                            
                                                                      
I really can't know. But to me the results look interesting enough to 
give it a try on certain problem situations (like yours) to find out  
if it is any better than the stock version. If you and others can     
confirm that things get better then I have no real doubts that Marcelo
can pick it up.                                                       
                                                                      
> One of the big reasons I chose to upgrade to a                      
>      later kernel version (from 2.4.8ac<something>+LVMpatches+...)  
was                                                                   
>      to get away from having to apply patches (and document which   
>      patches and where to get them etc).                            
                                                                      
Well, there is really nothing wrong with upgrading mainline kernels,  
as the are getting better with every release, so I would always       
suggest to take the releases up lets say a week after being out. Only 
your situation maybe can help to improve more, if you input some of   
your experiences in LKML with a patch like Martins. Feedback _is_     
required to find a solution to an existing problem.                   
                                                                      
>      If this is the route I have to go, I'll do it but, well, I'm   
not                                                                   
>      that comfortable with it.                                      
                                                                      
Well, my suggestions: don't patch around too much, but try single     
patches on stock kernel and evaluate them here.                       
                                                                      
Regards,                                                              
Stephan                                                               
                                                                      
                                                                      

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [2.4.17/18pre] VM and swap - it's really unusable
  2002-01-08  1:43                 ` Stephan von Krawczynski
@ 2002-01-08  3:10                   ` Petro
  2002-01-08  6:00                     ` Petro
  0 siblings, 1 reply; 49+ messages in thread
From: Petro @ 2002-01-08  3:10 UTC (permalink / raw)
  To: Stephan von Krawczynski; +Cc: andihartmann, linux-kernel

On Tue, Jan 08, 2002 at 02:43:42AM +0100, Stephan von Krawczynski wrote:
> > On Mon, Jan 07, 2002 at 03:33:48PM +0100, Stephan von Krawczynski   
> wrote:                                                                
> > > mysql question: is this a binary from some distro or              
> self-compiled? If                                                     
> > > self-compiled can you show your ./configure paras, please?        
> >                                                                     
> >     It's the binary from mysql.com.                                 
>                                                                       
> Beta or stable release?                                               

    Stable. 

> > > more efficient in low mem conditions and this may well be the case you are                                                               
> > > running into. This means 2.4.17 standard + patch.                 
> >      Is there a reasonable chance that martins patch will get mainlined                                                             
> >      in the near future?                                            
>                                                                       
> I really can't know. But to me the results look interesting enough to 
> give it a try on certain problem situations (like yours) to find out  
> if it is any better than the stock version. If you and others can     
> confirm that things get better then I have no real doubts that Marcelo
> can pick it up. 

    Out of ignorance and laziness, where is it again that I can get this
    kernel? 

> > One of the big reasons I chose to upgrade to a                      
> >      later kernel version (from 2.4.8ac<something>+LVMpatches+...)  
> was                                                                   
> >      to get away from having to apply patches (and document which   
> >      patches and where to get them etc).                            
>                                                                       
> Well, there is really nothing wrong with upgrading mainline kernels,

    Funny, I went from a working 2.4.8-ac<x> to a non-working
    2.4.13+patches when I started getting these crashes. At first I
    thought they were Mysql, so I called them. They said "Re-install
    windows", er, I mean upgrade my kernel to 2.4.16, which would "fix
    the problem", so I did, and it didn't. So they said to go to
    2.4.17rc2 as that would fix my problem, only it didn't. 


> as the are getting better with every release, so I would always       
> suggest to take the releases up lets say a week after being out. Only 

    Yeah, and build a debian package, distribute it to (looks behind me)
    100+ linux servers, including 6 mission critical heavily loaded DB
    machines. 

    Not to be a complete asswipe, but no. While I like playing with
    computers and all that, I don't have enough hours in the day to be
    rolling out new kernels every couple weeks and still have time left
    over to see my wife, shoot my guns, ride my motorcycles and drink my
    scotch. 

> your situation maybe can help to improve more, if you input some of   
> your experiences in LKML with a patch like Martins. Feedback _is_     
> required to find a solution to an existing problem.                   

    I understand completely, I'm just trying to figure out a way to test
    this that doesn't impact my site as drastically. See, we've only got
    two databases that will cause this fault, and of course they are the
    two most important ones, and the only way we can generate this fault
    is to put them live and wait for them to crash. 

> >      If this is the route I have to go, I'll do it but, well, I'm   
> not                                                                   
> >      that comfortable with it.                                      
>                                                                       
> Well, my suggestions: don't patch around too much, but try single     
> patches on stock kernel and evaluate them here.                       

    There are 2 other patches I need to apply, the first is the LVM
    1.0.1 patch, and the second is the VFS-lock patch. We need these to
    do snapshots. Which isn't bad, but I'm about the only one still here 
    who can do it (violates hit-by-a-bus rule).

-- 
Share and Enjoy. 

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [2.4.17/18pre] VM and swap - it's really unusable
  2002-01-06 15:48               ` Stephan von Krawczynski
@ 2002-01-08  5:09                 ` Ken Brownfield
  0 siblings, 0 replies; 49+ messages in thread
From: Ken Brownfield @ 2002-01-08  5:09 UTC (permalink / raw)
  To: Stephan von Krawczynski; +Cc: linux-kernel

On Sun, Jan 06, 2002 at 04:48:13PM +0100, Stephan von Krawczynski wrote:
[...]
| I read all your LKML mails since beginning of November, could find a lot about
| cpu, configs,tops etc but not a single "cat /proc/interrupts" together with
| uptime.

http://web.irridia.com/info/linux/APIC/

This was published back in the beginning (4/2001), and additional stuff
sent to Alan and Manfred for debugging.  I was pushing my problem on
LKML for a couple of weeks, but without much feedback I'm sticking to my
workaround.

This also feeds back to my earlier thoughts on some kind of LKML summary
page of patches and problem reports for those disinclined to wade
through the high LKML traffic.  It's hard for me, much less you, to go
back through the archives manually...

| According to the ongoings of your mails you seem to try really a lot of things
| to make it work out. I recommend not to intermix the patches a lot. I would
| stay close to marcelo's tree and try _single_ small patches on top of that. If
| you mix them up (even only two of them) you won't be able to track down very
| well, what is really better or worse.

Actually, that's why I don't test -aa.  Whatever Marcelo chooses to
include, I'll trust it in its entirety.  But I've tested, for example,
Linus' locked memory patch, and a couple of Andrew's isolated patches,
all applied to mainline with nothing else.  I can't try -aa because it
has interdependencies and unintentional (I assume) backouts of code.

| One thing I would like to ask here is this (as you are dealing with oracle
| stuff): why does oracle recommend to compile the kernel in 486 mode? I talked
| to someone who uses oracle on 2.4.x and he told me it is even in the latest
| docs. What is the voodoo behind that? Btw he has no freezes or the like, but
| occasional coredumps from oracle processes, which he states as "not nice, but
| no showstopper" as his clients reconnect/retransmit with only a slight delay.
| This may be related to VM, thats why I will try to convince him of some patches
| :-) and have a look at the coredump-frequency.

I haven't had any problems with Oracle at all since Linus' locked memory
patch back in the 2.4.14-15ish days.  This on a 4GB 6-way Xeon with
ext2, reiser, couple of other complications, with the kernel compiled
for P3.  I really don't know what would cause Oracle to misbehave with
an i686 kernel that wouldn't be a kernel bug.

Perhaps a gcc-related bug?  I'm still using 2.91.66 for kernels,
although I've used 2.95.x with no problems.  I'm not touching 2.96.x
with a ten-foot pole, waiting instead for a sane 3.x one of these years.

I think Oracle (the company) is a little short of tooth on Linux
experience, since for example and AFAIK they never discovered the fatal
2.4 locked memory problem -- that took Google's report and to a much
lesser extent my later discovery of the same problem.

-- 
Ken.
brownfld@irridia.com

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [2.4.17/18pre] VM and swap - it's really unusable
  2002-01-08  3:10                   ` Petro
@ 2002-01-08  6:00                     ` Petro
  0 siblings, 0 replies; 49+ messages in thread
From: Petro @ 2002-01-08  6:00 UTC (permalink / raw)
  To: Stephan von Krawczynski; +Cc: andihartmann, linux-kernel

On Mon, Jan 07, 2002 at 07:10:01PM -0800, Petro wrote:
> > can pick it up. 
> 
>     Out of ignorance and laziness, where is it again that I can get this
>     kernel? 

    Let me rephrase that. 

    Out of ignorance and laziness, exactly which patch is it that I
    need, and where can I find it? 

-- 
Share and Enjoy. 

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Update Re: [2.4.17/18pre] VM and swap - it's really unusable
  2002-01-05 15:08           ` Stephan von Krawczynski
  2002-01-05 21:40             ` Ken Brownfield
  2002-01-07  1:42             ` Rik van Riel
@ 2002-01-08 15:19             ` Ken Brownfield
  2 siblings, 0 replies; 49+ messages in thread
From: Ken Brownfield @ 2002-01-08 15:19 UTC (permalink / raw)
  To: Stephan von Krawczynski, M.H.VanLeeuwen, akpm; +Cc: linux-kernel

I stayed at work all night banging out tests on a few of our machines
here.  I took 2.4.18-pre2 and 2.4.18-pre2 with the vmscan patch from
"M.H.VanLeeuwen" <vanl@megsinet.net>.

My sustained test consisted of this type of load:

	ls -lR / > /dev/null &
	/usr/bin/slocate -u -f "nfs,smbfs,ncpfs,proc,devpts" -e "/tmp,/var/tmp,/usr/tmp,/afs,/net" &
	dd if=/dev/sda3 of=/sda3 bs=1024k &
	# Hit TUX on this machine repeatedly; html page with 1000 images
	# Wait for memory to be mostly used by buff/page cache
	./a.out &
	# repeat finished commands -- keep all commands running
	# after a.out finishes, alow buff/page to refill before repeating

The a.out in this case is a little program (attached, c.c) to allocate
and write to an amount of memory equal to physical RAM.  The example I
chose below is from a 2xP3/600 with 1GB of RAM and 2GB swap.

This was not a formal benchmark -- I think benchmarks have been
presented before by other folks, and looking at benchmarks does not
necessarily indicate the real-world problems that exist.  My intent was
to reproduce the issues I've been seeing, and then apply the MH (and
only the MH) patch and observe.

2.4.18-pre2

Once slocate starts and gets close to filling RAM with buffer/page
cache, kupdated and kswapd have periodic spikes of 50-100% CPU.

When a.out starts, kswapd and kupdated begin to eat significant portions
of CPU (20-100%) and I/O becomes more and more sluggish as a.out
allocates.

When a.out uses all free RAM and should begin eating cache, significant
swapping begins and cache is not decreased significantly until the
machine goes 100-200MB into swap.

Here are two readprofile outputs, sorted by ticks and load.

229689 default_idle                             4417.0962
  4794 file_read_actor                           18.4385
   405 __rdtsc_delay                             14.4643
  3763 do_anonymous_page                         14.0410
  3796 statm_pgd_range                            9.7835
  1535 prune_icache                               6.9773
   153 __free_pages                               4.7812
  1420 create_bounce                              4.1765
   583 sym53c8xx_intr                             3.9392
   221 atomic_dec_and_lock                        2.7625
  5214 generic_file_write                         2.5659

273464 total                                      0.1903
234168 default_idle                             4503.2308
  5298 generic_file_write                         2.6073
  4868 file_read_actor                           18.7231
  3799 statm_pgd_range                            9.7912
  3763 do_anonymous_page                         14.0410
  1535 prune_icache                               6.9773
  1526 shrink_cache                               1.6234
  1469 create_bounce                              4.3206
   643 rmqueue                                    1.1320
   591 sym53c8xx_intr                             3.9932
   505 __make_request                             0.2902

2.4.18-pre2 with MH

With the MH patch applied, the issues I witnessed above did not seem to
reproduce.  Memory allocation under pressure seemed faster and smoother.
kswapd never went above 5-15% CPU.  When a.out allocated memory, it did
not begin swapping until buffer/page cache had been nearly completely
cannibalized.

And when a.out caused swapping, it was controlled and behaved like you
would expect the VM to bahave -- slowly swapping out unused pages
instead of large swap write-outs without the patch.

Martin, have you done throughput benchmarks with MH/rmap/aa, BTW?

But both kernels still seem to be sluggish when it comes to doing small
I/O operations (vi, ls, etc) during heavy swapping activity.

Here are the readprofile results:

206243 default_idle                             3966.2115
  6486 file_read_actor                           24.9462
   409 __rdtsc_delay                             14.6071
  2798 do_anonymous_page                         10.4403
   185 __free_pages                               5.7812
  1846 statm_pgd_range                            4.7577
   469 sym53c8xx_intr                             3.1689
   176 atomic_dec_and_lock                        2.2000
   349 end_buffer_io_async                        1.9830
   492 refill_inactive                            1.8358
    94 system_call                                1.8077

245776 total                                      0.1710
216238 default_idle                             4158.4231
  6486 file_read_actor                           24.9462
  2799 do_anonymous_page                         10.4440
  1855 statm_pgd_range                            4.7809
  1611 generic_file_write                         0.7928
   839 __make_request                             0.4822
   820 shrink_cache                               0.7374
   540 rmqueue                                    0.9507
   534 create_bounce                              1.5706
   492 refill_inactive                            1.8358
   487 sym53c8xx_intr                             3.2905

There may be significant differences in the profile outputs for those
with VM fu.  

Summary: MH swaps _after_ cache has been properly cannibalized, and
swapping activity starts when expected and is properly throttled.
kswapd and kupdated don't seem to go into berserk 100% CPU mode.

At any rate, I now have the MH patch (and Andrew Morton's mini-ll and
read-latency2 patches) in production, and I like what I see so far.  I'd
vote for them to go into 2.4.18, IMHO.  Maybe the full low-latency patch
if it's not truly 2.5 material.

My next cook-off will be with -aa and rmap, although if the rather small
MH patch fixes my last issues it may be worth putting all VM effort into
a 2.5 VM cook-off. :)  Hopefully the useful stuff in -aa can get pulled
in at some point soon, though.

Thanks much to Martin H. VanLeeuwen for his patch and Stephan von
Krawczynski for his recommendations.  I'll let MH cook for a while and
I'll follow up later.
-- 
Ken.
brownfld@irridia.com

c.c:

#include <stdio.h>

#define MB_OF_RAM 1024

int
main()
{
	long stuffsize = MB_OF_RAM * 1048576 ;
	char *stuff ;

	if ( stuff = (char *)malloc( stuffsize ) ) {
		long chunksize = 1048576 ;
		long c ;

		for ( c=0 ; c<chunksize ; c++ )
			*(stuff+c) = '\0' ;
		/* hack; last chunk discarded if stuffsize%chunksize != 0 */
		for ( ; (c+chunksize)<stuffsize ; c+=chunksize )
			memcpy( stuff+c, stuff, chunksize );

		sleep( 120 );
	}
	else
		printf("OOPS\n");

	exit( 0 );
}

On Sat, Jan 05, 2002 at 04:08:33PM +0100, Stephan von Krawczynski wrote:
[...]
| I am pretty impressed by Martins test case where merely all VM patches fail
| with the exception of his own :-) The thing is, this test is not of nature
| "very special" but more like "system driven to limit by normal processes". And
| this is the real interesting part about it.
[...]

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [2.4.17/18pre] VM and swap - it's really unusable
  2002-01-03 20:23 ` Ken Brownfield
                     ` (2 preceding siblings ...)
  2002-01-04  0:19   ` Stephan von Krawczynski
@ 2002-01-11 20:41   ` Ken Brownfield
  2002-01-11 21:13     ` Mark Hahn
                       ` (2 more replies)
  3 siblings, 3 replies; 49+ messages in thread
From: Ken Brownfield @ 2002-01-11 20:41 UTC (permalink / raw)
  To: vanl; +Cc: linux-kernel

After more testing, my original observations seem to be holding up,
except that under heavy VM load (e.g., "make -j bzImage") the machine's
overall performance seems far lower.  For instance, without the patch
the -j build finishes in ~10 minutes (2x933P3/256MB) but with the patch
I haven't had the patience to let it finish after more than an hour.

This is perhaps because the vmscan patch is too aggressively shrinking
the caches, or causing thrashing in another area?  I'm also noticing
that the amount of swap used is nearly an order of magnitude higher,
which doesn't make sense at first glance...  Also, there are extended
periods where idle CPU is 50-80%.

Maybe the patch or at least its intent can be merged with Andrea's work
if applicable?

Thanks,
-- 
Ken.
brownfld@irridia.com

On Thu, Jan 03, 2002 at 02:23:01PM -0600, Ken Brownfield wrote:
| Unfortunately, I lost the response that basically said "2.4 looks stable
| to me", but let me count the ways in which I agree with Andreas'
| sentiment:
| 
| 	A) VM has major issues
| 		1) about a dozen recent OOPS reports in VM code
| 		2) VM falls down on large-memory machines with a
| 		   high inode count (slocate/updatedb, i/dcache)
| 		3) Memory allocation failures and OOM triggers
| 		   even though caches remain full.
| 		4) Other bugs fixed in -aa and others
| 	B) Live- and dead-locks that I'm seeing on all 2.4 production
| 	   machines > 2.4.9, possibly related to A.  But how will I
| 	   ever find out?
| 	C) IO-APIC code that requires noapic on any and all SMP
| 	   machines that I've ever run on.
| 
| I don't have anything against anyone here -- I think everyone is doing a
| fine job.  It's an issue of acceptance of the problem and focus.  These
| issues are all showstoppers for me, and while I don't represent the 90%
| of the Linux market that is UP desktops, IMHO future work on the kernel
| will be degraded by basic functionality that continues to cause
| problems.
| 
| I think seeing some of Andrea's and Andrew's et al patches actually
| *happen* would be a good thing, since 2.4 kernels are decidedly not
| ready for production here.  I am forced to apply 26 distinct patch sets
| to my kernels, and I am NOT the right person to make these judgements.
| Which is why I was interested in an LKML summary source, though I
| haven't yet had a chance to catch up on that thread of comment.
| 
| Having a glitch in the radeon driver is one thing; having persistent,
| fatal, and reproducable failures in universal kernel code is entirely
| another.
| 
| -- 
| Ken.
| brownfld@irridia.com
| 
| 
| On Fri, Dec 28, 2001 at 09:16:38PM +0100, Andreas Hartmann wrote:
| | Hello all,
| | 
| | Again, I did a rsync-operation as described in
| | "[2.4.17rc1] Swapping" MID <3C1F4014.2010705@athlon.maya.org>.
| | 
| | This time, the kernel had a swappartition which was about 200MB. As the 
| | swap-partition was fully used, the kernel killed all processes of knode.
| | Nearly 50% of RAM had been used for buffers at this moment. Why is there 
| | so much memory used for buffers?
| | 
| | I know I repeat it, but please:
| | 
| | 	Fix the VM-management in kernel 2.4.x. It's unusable. Believe
| | 	me! As comparison: kernel 2.2.19 didn't need nearly any swap for
| | 	the same operation!
| | 
| | Please consider that I'm using 512 MB of RAM. This should, or better: 
| | must be enough to do the rsync-operation nearly without any swapping - 
| | kernel 2.2.19 does it!
| | 
| | The performance of kernel 2.4.18pre1 is very poor, which is no surprise, 
| | because the machine swaps nearly nonstop.
| | 
| | 
| | Regards,
| | Andreas Hartmann
| | 
| | -
| | To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
| | the body of a message to majordomo@vger.kernel.org
| | More majordomo info at  http://vger.kernel.org/majordomo-info.html
| | Please read the FAQ at  http://www.tux.org/lkml/
| -
| To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
| the body of a message to majordomo@vger.kernel.org
| More majordomo info at  http://vger.kernel.org/majordomo-info.html
| Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [2.4.17/18pre] VM and swap - it's really unusable
  2002-01-11 20:41   ` Ken Brownfield
@ 2002-01-11 21:13     ` Mark Hahn
  2002-01-11 21:38       ` Ken Brownfield
  2002-01-11 23:38       ` Rik van Riel
  2002-01-11 21:23     ` Ken Brownfield
  2002-01-12  0:13     ` M.H.VanLeeuwen
  2 siblings, 2 replies; 49+ messages in thread
From: Mark Hahn @ 2002-01-11 21:13 UTC (permalink / raw)
  To: linux-kernel

> overall performance seems far lower.  For instance, without the patch
> the -j build finishes in ~10 minutes (2x933P3/256MB) but with the patch

please, PLEASE stop using "make -j" 
for anything except the fork-bomb that it is.
pretending that it's a benchmark, especially one 
to guide kernel tuning, is a travesty!

if you want to simulate VM load, so something sane like
boot with mem=32M, or a simple "mmap(lots); mlockall" tool.

regards, mark hahn.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [2.4.17/18pre] VM and swap - it's really unusable
  2002-01-11 20:41   ` Ken Brownfield
  2002-01-11 21:13     ` Mark Hahn
@ 2002-01-11 21:23     ` Ken Brownfield
  2002-01-12  0:13     ` M.H.VanLeeuwen
  2 siblings, 0 replies; 49+ messages in thread
From: Ken Brownfield @ 2002-01-11 21:23 UTC (permalink / raw)
  To: vanl; +Cc: linux-kernel

Andrew Morton kindly pointed out that my crack pipe is dangerously empty
and I didn't specify what patch I was talking about.  In my defense, I
was up all last night tracking down the ext3 bug that Andrew fixed right
under me. ;)

I replied to the wrong message, which I've pasted below.  This is wrt
Martin's VM patch per the previous discussion.

Apologies,
-- 
Ken.
brownfld@irridia.com


On Fri, Jan 11, 2002 at 02:41:17PM -0600, Ken Brownfield wrote:
| After more testing, my original observations seem to be holding up,
| except that under heavy VM load (e.g., "make -j bzImage") the machine's
| overall performance seems far lower.  For instance, without the patch
| the -j build finishes in ~10 minutes (2x933P3/256MB) but with the patch
| I haven't had the patience to let it finish after more than an hour.
| 
| This is perhaps because the vmscan patch is too aggressively shrinking
| the caches, or causing thrashing in another area?  I'm also noticing
| that the amount of swap used is nearly an order of magnitude higher,
| which doesn't make sense at first glance...  Also, there are extended
| periods where idle CPU is 50-80%.
| 
| Maybe the patch or at least its intent can be merged with Andrea's work
| if applicable?
| 
| Thanks,
| -- 
| Ken.
| brownfld@irridia.com

What I SHOULD have replied to:

| Date:   Tue, 8 Jan 2002 09:19:57 -0600
| From: Ken Brownfield <brownfld@irridia.com>
| To: Stephan von Krawczynski <skraw@ithnet.com>,
|         "M.H.VanLeeuwen" <vanl@megsinet.net>, akpm@zip.com.au
| Cc: linux-kernel@vger.kernel.org
| Subject: Update Re: [2.4.17/18pre] VM and swap - it's really unusable
| User-Agent: Mutt/1.2.5.1i
| In-Reply-To: <20020105160833.0800a182.skraw@ithnet.com>; from skraw@ithnet.com o
| n Sat, Jan 05, 2002 at 04:08:33PM +0100
| Precedence: bulk
| X-Mailing-List:         linux-kernel@vger.kernel.org
| 
| I stayed at work all night banging out tests on a few of our machines
| here.  I took 2.4.18-pre2 and 2.4.18-pre2 with the vmscan patch from
| "M.H.VanLeeuwen" <vanl@megsinet.net>.
| 
| My sustained test consisted of this type of load:
| 
|         ls -lR / > /dev/null &
|         /usr/bin/slocate -u -f "nfs,smbfs,ncpfs,proc,devpts" -e "/tmp,/var/tmp,/
| usr/tmp,/afs,/net" &
|         dd if=/dev/sda3 of=/sda3 bs=1024k &
|         # Hit TUX on this machine repeatedly; html page with 1000 images
|         # Wait for memory to be mostly used by buff/page cache
|         ./a.out &
|         # repeat finished commands -- keep all commands running
|         # after a.out finishes, alow buff/page to refill before repeating
| 
| The a.out in this case is a little program (attached, c.c) to allocate
| and write to an amount of memory equal to physical RAM.  The example I
| chose below is from a 2xP3/600 with 1GB of RAM and 2GB swap.
| 
| This was not a formal benchmark -- I think benchmarks have been
| presented before by other folks, and looking at benchmarks does not
| necessarily indicate the real-world problems that exist.  My intent was
| to reproduce the issues I've been seeing, and then apply the MH (and
| only the MH) patch and observe.
| 
| 2.4.18-pre2
| 
| Once slocate starts and gets close to filling RAM with buffer/page
| cache, kupdated and kswapd have periodic spikes of 50-100% CPU.
| 
| When a.out starts, kswapd and kupdated begin to eat significant portions
| of CPU (20-100%) and I/O becomes more and more sluggish as a.out
| allocates.
| 
| When a.out uses all free RAM and should begin eating cache, significant
| swapping begins and cache is not decreased significantly until the
| machine goes 100-200MB into swap.
| 
| Here are two readprofile outputs, sorted by ticks and load.
| 
| 229689 default_idle                             4417.0962
|   4794 file_read_actor                           18.4385
|    405 __rdtsc_delay                             14.4643
|   3763 do_anonymous_page                         14.0410
|   3796 statm_pgd_range                            9.7835
|   1535 prune_icache                               6.9773
|    153 __free_pages                               4.7812
|   1420 create_bounce                              4.1765
|    583 sym53c8xx_intr                             3.9392
|    221 atomic_dec_and_lock                        2.7625
|   5214 generic_file_write                         2.5659
| 
| 273464 total                                      0.1903
| 234168 default_idle                             4503.2308
|   5298 generic_file_write                         2.6073
|   4868 file_read_actor                           18.7231
|   3799 statm_pgd_range                            9.7912
|   3763 do_anonymous_page                         14.0410
|   1535 prune_icache                               6.9773
|   1526 shrink_cache                               1.6234
|   1469 create_bounce                              4.3206
|    643 rmqueue                                    1.1320
|    591 sym53c8xx_intr                             3.9932
|    505 __make_request                             0.2902
| 
| 
| 2.4.18-pre2 with MH
| 
| With the MH patch applied, the issues I witnessed above did not seem to
| reproduce.  Memory allocation under pressure seemed faster and smoother.
| kswapd never went above 5-15% CPU.  When a.out allocated memory, it did
| not begin swapping until buffer/page cache had been nearly completely
| cannibalized.
| 
| And when a.out caused swapping, it was controlled and behaved like you
| would expect the VM to bahave -- slowly swapping out unused pages
| instead of large swap write-outs without the patch.
| 
| Martin, have you done throughput benchmarks with MH/rmap/aa, BTW?
| 
| But both kernels still seem to be sluggish when it comes to doing small
| I/O operations (vi, ls, etc) during heavy swapping activity.
| 
| Here are the readprofile results:
| 
| 206243 default_idle                             3966.2115
|   6486 file_read_actor                           24.9462
|    409 __rdtsc_delay                             14.6071
|   2798 do_anonymous_page                         10.4403
|    185 __free_pages                               5.7812
|   1846 statm_pgd_range                            4.7577
|    469 sym53c8xx_intr                             3.1689
|    176 atomic_dec_and_lock                        2.2000
|    349 end_buffer_io_async                        1.9830
|    492 refill_inactive                            1.8358
|     94 system_call                                1.8077
| 
| 245776 total                                      0.1710
| 216238 default_idle                             4158.4231
|   6486 file_read_actor                           24.9462
|   2799 do_anonymous_page                         10.4440
|   1855 statm_pgd_range                            4.7809
|   1611 generic_file_write                         0.7928
|    839 __make_request                             0.4822
|    820 shrink_cache                               0.7374
|    540 rmqueue                                    0.9507
|    534 create_bounce                              1.5706
|    492 refill_inactive                            1.8358
|    487 sym53c8xx_intr                             3.2905
| 
| 
| There may be significant differences in the profile outputs for those
| with VM fu.  
| 
| Summary: MH swaps _after_ cache has been properly cannibalized, and
| swapping activity starts when expected and is properly throttled.
| kswapd and kupdated don't seem to go into berserk 100% CPU mode.
| 
| At any rate, I now have the MH patch (and Andrew Morton's mini-ll and
| read-latency2 patches) in production, and I like what I see so far.  I'd
| vote for them to go into 2.4.18, IMHO.  Maybe the full low-latency patch
| if it's not truly 2.5 material.
| 
| My next cook-off will be with -aa and rmap, although if the rather small
| MH patch fixes my last issues it may be worth putting all VM effort into
| a 2.5 VM cook-off. :)  Hopefully the useful stuff in -aa can get pulled
| in at some point soon, though.
| 
| Thanks much to Martin H. VanLeeuwen for his patch and Stephan von
| Krawczynski for his recommendations.  I'll let MH cook for a while and
| I'll follow up later.
| -- 
| Ken.
| brownfld@irridia.com
| 
| c.c:
| 
| #include <stdio.h>
| 
| #define MB_OF_RAM 1024
| 
| int
| main()
| {
|         long stuffsize = MB_OF_RAM * 1048576 ;
|         char *stuff ;
| 
|         if ( stuff = (char *)malloc( stuffsize ) ) {
|                 long chunksize = 1048576 ;
|                 long c ;
| 
|                 for ( c=0 ; c<chunksize ; c++ )
|                         *(stuff+c) = '\0' ;
|                 /* hack; last chunk discarded if stuffsize%chunksize != 0 */
|                 for ( ; (c+chunksize)<stuffsize ; c+=chunksize )
|                         memcpy( stuff+c, stuff, chunksize );
|         
|                 sleep( 120 );
|         }
|         else
|                 printf("OOPS\n");
| 
|         exit( 0 );
| }
| 
| 
| On Sat, Jan 05, 2002 at 04:08:33PM +0100, Stephan von Krawczynski wrote:
| [...]
| | I am pretty impressed by Martins test case where merely all VM patches fail
| | with the exception of his own :-) The thing is, this test is not of nature
| | "very special" but more like "system driven to limit by normal processes". And
| | this is the real interesting part about it.
| [...]
| 

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [2.4.17/18pre] VM and swap - it's really unusable
  2002-01-11 21:13     ` Mark Hahn
@ 2002-01-11 21:38       ` Ken Brownfield
  2002-01-11 23:38       ` Rik van Riel
  1 sibling, 0 replies; 49+ messages in thread
From: Ken Brownfield @ 2002-01-11 21:38 UTC (permalink / raw)
  To: Mark Hahn; +Cc: linux-kernel

I don't think I made the claim that this was a benchmark -- I certainly
realize that "make -j bzImage" is not real-world, but it is clearly
indicative of heavy VM/CPU/context load.  Since I don't believe this
patch is currently in the running for inclusion, I'm just giving general
feedback to the patch author rather than making a case.

For instance, "make -j bzImage" reproduced the ext3 bug that Andrew
found where my other VM-intensive apps did not.  I doubt we should keep
the bug in the kernel because the situation isn't real-world enough.

But yes, a bug is worse than a behavior flaw, granted.
-- 
Ken.
brownfld@irridia.com

On Fri, Jan 11, 2002 at 04:13:00PM -0500, Mark Hahn wrote:
| > overall performance seems far lower.  For instance, without the patch
| > the -j build finishes in ~10 minutes (2x933P3/256MB) but with the patch
| 
| please, PLEASE stop using "make -j" 
| for anything except the fork-bomb that it is.
| pretending that it's a benchmark, especially one 
| to guide kernel tuning, is a travesty!
| 
| if you want to simulate VM load, so something sane like
| boot with mem=32M, or a simple "mmap(lots); mlockall" tool.
| 
| regards, mark hahn.
| 
| -
| To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
| the body of a message to majordomo@vger.kernel.org
| More majordomo info at  http://vger.kernel.org/majordomo-info.html
| Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [2.4.17/18pre] VM and swap - it's really unusable
  2002-01-11 21:13     ` Mark Hahn
  2002-01-11 21:38       ` Ken Brownfield
@ 2002-01-11 23:38       ` Rik van Riel
  1 sibling, 0 replies; 49+ messages in thread
From: Rik van Riel @ 2002-01-11 23:38 UTC (permalink / raw)
  To: Mark Hahn; +Cc: linux-kernel

On Fri, 11 Jan 2002, Mark Hahn wrote:

> > overall performance seems far lower.  For instance, without the patch
> > the -j build finishes in ~10 minutes (2x933P3/256MB) but with the patch
>
> please, PLEASE stop using "make -j"
> for anything except the fork-bomb that it is.
> pretending that it's a benchmark, especially one
> to guide kernel tuning, is a travesty!

Actually, it's as good a benchmark as any. Knowing
how well the system is able to recover from heavy
overload situations is useful to know if your
server gets heavily overloaded at times.

If one VM falls over horribly under half the load
it takes to make another VM go slower, I know which
one I'd want on my server.

> if you want to simulate VM load, so something sane like
> boot with mem=32M, or a simple "mmap(lots); mlockall" tool.

... and then you come up with something WAY less
realistic than 'make -j' ;)))

cheers,

Rik
-- 
"Linux holds advantages over the single-vendor commercial OS"
    -- Microsoft's "Competing with Linux" document

http://www.surriel.com/		http://distro.conectiva.com/


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [2.4.17/18pre] VM and swap - it's really unusable
  2002-01-11 20:41   ` Ken Brownfield
  2002-01-11 21:13     ` Mark Hahn
  2002-01-11 21:23     ` Ken Brownfield
@ 2002-01-12  0:13     ` M.H.VanLeeuwen
  2 siblings, 0 replies; 49+ messages in thread
From: M.H.VanLeeuwen @ 2002-01-12  0:13 UTC (permalink / raw)
  To: Ken Brownfield; +Cc: linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1395 bytes --]

Ken Brownfield wrote:
> 
> After more testing, my original observations seem to be holding up,
> except that under heavy VM load (e.g., "make -j bzImage") the machine's
> overall performance seems far lower.  For instance, without the patch
> the -j build finishes in ~10 minutes (2x933P3/256MB) but with the patch
> I haven't had the patience to let it finish after more than an hour.
> 
> This is perhaps because the vmscan patch is too aggressively shrinking
> the caches, or causing thrashing in another area?  I'm also noticing
> that the amount of swap used is nearly an order of magnitude higher,
> which doesn't make sense at first glance...  Also, there are extended
> periods where idle CPU is 50-80%.
> 
> Maybe the patch or at least its intent can be merged with Andrea's work
> if applicable?
> 
> Thanks,
> --
> Ken.
> brownfld@irridia.com
> 


Ken,

Attached is an update to my previous vmscan.patch.2.4.17.c

Version "d" fixes a BUG due to a race in the old code _and_
is much less agressive at cache_shrinkage or conversely more
willing to swap out but not as much as the stock kernel.

It continues to work well wrt to high vm pressure.

Give it a whirl to see if it changes your "-j" symptoms.

If you like you can change the one line in the patch
from "DEF_PRIORITY" which is "6" to progressively smaller
values to "tune" whatever kind of swap_out behaviour you
like.

Martin

[-- Attachment #2: vmscan.patch.2.4.17.d --]
[-- Type: application/octet-stream, Size: 1325 bytes --]

--- linux.virgin/mm/vmscan.c	Mon Dec 31 12:46:25 2001
+++ linux/mm/vmscan.c	Fri Jan 11 18:03:05 2002
@@ -394,9 +394,9 @@
 		if (PageDirty(page) && is_page_cache_freeable(page) && page->mapping) {
 			/*
 			 * It is not critical here to write it only if
-			 * the page is unmapped beause any direct writer
+			 * the page is unmapped because any direct writer
 			 * like O_DIRECT would set the PG_dirty bitflag
-			 * on the phisical page after having successfully
+			 * on the physical page after having successfully
 			 * pinned it and after the I/O to the page is finished,
 			 * so the direct writes to the page cannot get lost.
 			 */
@@ -480,11 +480,14 @@
 
 			/*
 			 * Alert! We've found too many mapped pages on the
-			 * inactive list, so we start swapping out now!
+			 * inactive list.
+			 * Move referenced pages to the active list.
 			 */
-			spin_unlock(&pagemap_lru_lock);
-			swap_out(priority, gfp_mask, classzone);
-			return nr_pages;
+			if (PageReferenced(page) && !PageLocked(page)) {
+				del_page_from_inactive_list(page);
+				add_page_to_active_list(page);
+			}
+			continue;
 		}
 
 		/*
@@ -521,6 +524,9 @@
 	}
 	spin_unlock(&pagemap_lru_lock);
 
+	if (max_mapped <= 0 && (nr_pages > 0 || priority < DEF_PRIORITY))
+		swap_out(priority, gfp_mask, classzone);
+
 	return nr_pages;
 }
 

^ permalink raw reply	[flat|nested] 49+ messages in thread

end of thread, other threads:[~2002-01-12  0:13 UTC | newest]

Thread overview: 49+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2001-12-28 20:16 [2.4.17/18pre] VM and swap - it's really unusable Andreas Hartmann
2001-12-28 20:32 ` Rik van Riel
     [not found] ` <3C2CD9EC.1D6C798E@zip.com.au>
2001-12-28 21:26   ` Andreas Hartmann
2001-12-29  0:30 ` Alan Cox
2001-12-29 13:14 ` Andreas Hartmann
2001-12-29 15:15 ` Andrea Arcangeli
2002-01-03 20:23 ` Ken Brownfield
2002-01-03 20:50   ` Rik van Riel
2002-01-03 21:54   ` Andrew Morton
2002-01-04  4:56     ` Ken Brownfield
2002-01-04  0:19   ` Stephan von Krawczynski
2002-01-04  5:26     ` Ken Brownfield
2002-01-04  8:06       ` Ville Herva
2002-01-04 13:05         ` Stephan von Krawczynski
2002-01-04 13:03       ` Stephan von Krawczynski
2002-01-04 23:50         ` Ken Brownfield
2002-01-05 15:08           ` Stephan von Krawczynski
2002-01-05 21:40             ` Ken Brownfield
2002-01-06 15:48               ` Stephan von Krawczynski
2002-01-08  5:09                 ` Ken Brownfield
2002-01-07  1:42             ` Rik van Riel
2002-01-07  2:22               ` Rik van Riel
2002-01-07 14:20                 ` Stephan von Krawczynski
2002-01-08  0:36                   ` Rik van Riel
2002-01-08 15:19             ` Update " Ken Brownfield
2002-01-04 20:15     ` Andreas Hartmann
2002-01-04 20:55       ` Stephan von Krawczynski
2002-01-05  8:39         ` Andreas Hartmann
2002-01-05 12:59           ` M. Edward (Ed) Borasky
2002-01-05 15:09             ` Andreas Hartmann
2002-01-05 17:51               ` M. Edward (Ed) Borasky
2002-01-06 15:51             ` vda
2002-01-06 19:16               ` M. Edward (Ed) Borasky
2002-01-06 19:38                 ` Alan Cox
2002-01-07  0:47                   ` M. Edward Borasky
2002-01-05  9:24       ` Petro
2002-01-05 15:44         ` Stephan von Krawczynski
2002-01-07  7:15           ` Petro
2002-01-07 14:33             ` Stephan von Krawczynski
2002-01-07 20:29               ` Petro
2002-01-08  1:43                 ` Stephan von Krawczynski
2002-01-08  3:10                   ` Petro
2002-01-08  6:00                     ` Petro
2002-01-11 20:41   ` Ken Brownfield
2002-01-11 21:13     ` Mark Hahn
2002-01-11 21:38       ` Ken Brownfield
2002-01-11 23:38       ` Rik van Riel
2002-01-11 21:23     ` Ken Brownfield
2002-01-12  0:13     ` M.H.VanLeeuwen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox