* kswapd, kupdated, and bdflush at 99% under intense IO @ 2001-04-10 20:01 Jeff Lessem 2001-04-10 20:25 ` Phil Oester 0 siblings, 1 reply; 5+ messages in thread From: Jeff Lessem @ 2001-04-10 20:01 UTC (permalink / raw) To: linux-kernel My machine is an 8 processor Dell P-III 700Mhz with 8GB of memory. The disk system I am using is a 12 drawer JBOD with 5 disks in a raid 5 arrangement attached to an AMI Megaraid 438/466/467/471/493 controller with a total of 145GB of space. The machine has been in use for about 6 months doing primarily cpu and memory intensive scientific computing tasks. It has been very stable in this role and everybody involved has been pleased with its performance. Recently a decision was made to conglomerate people's home directories from around the network and put them all on this machine (hence the JBOD and RAID). These tests are all being done with Linux 2.4.3 + the bigpatch fix for knfsd and quotas. The rest of the OS is Debian unstable. Before moving the storage into production I am performing tests on it to gauge its stability. The first test I performed was a single bonnie++ -s 16096 instance, and the timing results are inline with what I would expect from fast SCSI disks. However, multiple instance of bonnie++ completely kill the machine. Once two or three bonnies are running kswapd, kupdated, and bdflush each jump to using 99% of a cpu and the machine becomes incredibly unresponsive. Even using a root shell at nice -20 it can take several minutes for "killall bonnie++" to appear after being typed and then run. After the bonnies are killed and kswapd, kupdated, and bdflush are given a minute or two to finish whatever they are doing, the machine becomes responsive again. I don't think the machine should be behaving like this. I certainly expect some slowdowns with that much IO, but the computer should still be resonably responsive, particularly because no system or user files that need to be accessed are on that channel of the SCSI controller. Any advice on approaching this problem would be appreciated. I will try my best to provide any debugging information that would be useful, but the machine is on another continent from myself, so without a serial console I have a hard time getting any information that doesn't make it into a logfile. -- Thanks, Jeff Lessem. ^ permalink raw reply [flat|nested] 5+ messages in thread
* RE: kswapd, kupdated, and bdflush at 99% under intense IO 2001-04-10 20:01 kswapd, kupdated, and bdflush at 99% under intense IO Jeff Lessem @ 2001-04-10 20:25 ` Phil Oester 2001-04-10 22:05 ` Alan Cox 0 siblings, 1 reply; 5+ messages in thread From: Phil Oester @ 2001-04-10 20:25 UTC (permalink / raw) To: Jeff Lessem, linux-kernel I've seen similar 'unresponsiveness' running 2.4.3-ac2 on a Qmail server. The hardware is dual-processor PIII 650 w/1GB of RAM. SCSI is sym53c895 with dual Quantum 9gb drives. Any time I start injecting lots of mail into the qmail queue, *one* of the two processors gets pegged at 99%, and it takes forever for anything typed at the console to actually appear (just as you describe). But I don't see any particular user process in top using a great deal of cpu - just the system itself. In my case, however, I usually have to powercycle the box to get it back - it totally dies. I've started the kernel with profile=2, and had a cron job running every minute to capture a readprofile -r; sleep 10; readprofile, but when the processor pegs, the cron jobs just stop without catching any useful information before the freeze. The interesting thing is, the box still responds to pings at this time, even though it goes hours without any profile captures. Upon powercycling, the qmail partition is loaded with thousands of errors - which could be caused by the power cycling, or by something kernel related. In the meantime, I've had to revert to 2.2.19 any time I do intense mailings. -Phil Oester -----Original Message----- From: linux-kernel-owner@vger.kernel.org [mailto:linux-kernel-owner@vger.kernel.org]On Behalf Of Jeff Lessem Sent: Tuesday, April 10, 2001 1:01 PM To: linux-kernel@vger.kernel.org Subject: kswapd, kupdated, and bdflush at 99% under intense IO My machine is an 8 processor Dell P-III 700Mhz with 8GB of memory. The disk system I am using is a 12 drawer JBOD with 5 disks in a raid 5 arrangement attached to an AMI Megaraid 438/466/467/471/493 controller with a total of 145GB of space. The machine has been in use for about 6 months doing primarily cpu and memory intensive scientific computing tasks. It has been very stable in this role and everybody involved has been pleased with its performance. Recently a decision was made to conglomerate people's home directories from around the network and put them all on this machine (hence the JBOD and RAID). These tests are all being done with Linux 2.4.3 + the bigpatch fix for knfsd and quotas. The rest of the OS is Debian unstable. Before moving the storage into production I am performing tests on it to gauge its stability. The first test I performed was a single bonnie++ -s 16096 instance, and the timing results are inline with what I would expect from fast SCSI disks. However, multiple instance of bonnie++ completely kill the machine. Once two or three bonnies are running kswapd, kupdated, and bdflush each jump to using 99% of a cpu and the machine becomes incredibly unresponsive. Even using a root shell at nice -20 it can take several minutes for "killall bonnie++" to appear after being typed and then run. After the bonnies are killed and kswapd, kupdated, and bdflush are given a minute or two to finish whatever they are doing, the machine becomes responsive again. I don't think the machine should be behaving like this. I certainly expect some slowdowns with that much IO, but the computer should still be resonably responsive, particularly because no system or user files that need to be accessed are on that channel of the SCSI controller. Any advice on approaching this problem would be appreciated. I will try my best to provide any debugging information that would be useful, but the machine is on another continent from myself, so without a serial console I have a hard time getting any information that doesn't make it into a logfile. -- Thanks, Jeff Lessem. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: kswapd, kupdated, and bdflush at 99% under intense IO 2001-04-10 20:25 ` Phil Oester @ 2001-04-10 22:05 ` Alan Cox 2001-04-10 22:16 ` Rik van Riel 0 siblings, 1 reply; 5+ messages in thread From: Alan Cox @ 2001-04-10 22:05 UTC (permalink / raw) To: Phil Oester; +Cc: Jeff Lessem, linux-kernel > Any time I start injecting lots of mail into the qmail queue, *one* of the > two processors gets pegged at 99%, and it takes forever for anything typed > at the console to actually appear (just as you describe). But I don't see Yes I've seen this case. Its partially still a mystery > Upon powercycling, the qmail partition is loaded with thousands of errors - > which could be caused by the power cycling, or by something kernel related. Under heavy I/O loads the cerberus test suite has been showing real disk corruption on all current trees until Ingo's patch today to fix the ext2 and minix problems combined with the earlier fixes for other races In your case I suspect its the qmail thousands of files being created/deleted not the corruption but its hard to be sure ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: kswapd, kupdated, and bdflush at 99% under intense IO 2001-04-10 22:05 ` Alan Cox @ 2001-04-10 22:16 ` Rik van Riel 2001-04-11 12:46 ` Jan Harkes 0 siblings, 1 reply; 5+ messages in thread From: Rik van Riel @ 2001-04-10 22:16 UTC (permalink / raw) To: Alan Cox; +Cc: Phil Oester, Jeff Lessem, linux-kernel On Tue, 10 Apr 2001, Alan Cox wrote: > > Any time I start injecting lots of mail into the qmail queue, *one* of the > > two processors gets pegged at 99%, and it takes forever for anything typed > > at the console to actually appear (just as you describe). But I don't see > > Yes I've seen this case. Its partially still a mystery I've seen it too. It could be some interaction between kswapd and bdflush ... but I'm not sure what the exact cause would be. regards, Rik -- Virtual memory is like a game you can't win; However, without VM there's truly nothing to lose... http://www.surriel.com/ http://www.conectiva.com/ http://distro.conectiva.com.br/ ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: kswapd, kupdated, and bdflush at 99% under intense IO 2001-04-10 22:16 ` Rik van Riel @ 2001-04-11 12:46 ` Jan Harkes 0 siblings, 0 replies; 5+ messages in thread From: Jan Harkes @ 2001-04-11 12:46 UTC (permalink / raw) To: Rik van Riel; +Cc: linux-kernel On Tue, Apr 10, 2001 at 07:16:06PM -0300, Rik van Riel wrote: > I've seen it too. It could be some interaction between kswapd > and bdflush ... but I'm not sure what the exact cause would be. Syncing dirty inodes requires in some cases page allocations. The existing code in try_to_free_pages calls shrink_icache_memory during free_shortage. So we are probably stealing the few pages that we managed to free up a bit earlier, exactly around the time that we're already critically low on memory. The patch I sent you a while ago actually avoids this by triggering an extra run of kupdated but doesn't sync the dirty inodes in the more critical try_to_free_pages path. I've been running it on machines with 24MB, 64MB and 512MB, haven't had any problems. It is noticeable that the nightly updatedb run flushes the dentry/inode cache. In the morning my email reader has to pull the email related inodes back into memory (maildir format). It doesn't have to do this the rest of the day. As far as I am concerned, this actually shows that the system is now adapting to the kind of usage that occurs. Jan ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2001-04-11 12:47 UTC | newest] Thread overview: 5+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2001-04-10 20:01 kswapd, kupdated, and bdflush at 99% under intense IO Jeff Lessem 2001-04-10 20:25 ` Phil Oester 2001-04-10 22:05 ` Alan Cox 2001-04-10 22:16 ` Rik van Riel 2001-04-11 12:46 ` Jan Harkes
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox