public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* High CPU usage (up to server hang) under heavy I/O load
@ 2004-08-13 14:01 Sylvain COUTANT
  2004-08-13 15:36 ` Matt Domsch
                   ` (2 more replies)
  0 siblings, 3 replies; 15+ messages in thread
From: Sylvain COUTANT @ 2004-08-13 14:01 UTC (permalink / raw)
  To: linux-kernel

Gurus,

I have a problem with one server (DELL, 1 TB RAID5 + RAID0, Bi-Xeon, 8 GB
RAM) which, sometimes, goes mad when the I/O pressure gets too high. We use
this server as a VMWare server and as a backup server (200 GB are dedicated
to the backup part). We have run full hardware diags and checked every
software that runs on the system. We have been able to reproduce the problem
once without having launched the VMWare server (so I believe this software
is not responsible for the problem).

We have tested kernels 2.4.22 and 2.4.26. The server is running under Debian
Woody.

The exact problem is the unix load sometimes goes _very high_ under I/O
activity (mainly disk writes seem to cause this). Even having just two or
three tar+gz processes running can cause the 15 minutes load average to get
as high as 20 or 30 (where we would expect it being between 2 and 4). The
load can go to so high value that we can't do anything anymore on the server
during some delay (between a few seconds and a few days). Our Friday night
backups often hang the server until we unplug the power to reboot on monday
morning. From what we have seen, it can happen that kswapd and kupdated eats
up to 70% of one CPU each. Otherwise, it's very hard to tell what happens
exactly because when the server is slowing down we have no possible
monitoring, no log, no alerts, no automatic reboot (must power off/on to
reboot), no console alert, ... Just, that it's still pingable !!

We didn't found a way to reproduce this behaviour with a specific test case.
Sometimes the server will be fine during a few days then slow down and hang.
Sometimes it will hang just a few hours after having been booted.

I have spent hours searching for information and found our problem was very
common in early 2.4.x kernels (virtual memory management) between 2000 and
2002 on servers with large RAM. Only recent information I found was some
patches related to kernel hanging or something like, but symptoms described
was never exactly the same as ours.

I tried to play a little with "/proc/sys/vm/*" settings (mainly bdflush) but
I didn't found any major improvement (perhaps just because I didn't put in
the good values). What I was trying to do was reducing the amount of memory
the kernel could use for dirty buffers, thus having more regular flush to
disks.

Hopefully, some people here could be of help ;-))


Regards,
Sylvain.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: High CPU usage (up to server hang) under heavy I/O load
  2004-08-13 14:01 High CPU usage (up to server hang) under heavy I/O load Sylvain COUTANT
@ 2004-08-13 15:36 ` Matt Domsch
  2004-08-13 15:46   ` Sylvain COUTANT
  2004-08-13 16:20 ` Marcelo Tosatti
  2004-08-13 22:16 ` Alan Cox
  2 siblings, 1 reply; 15+ messages in thread
From: Matt Domsch @ 2004-08-13 15:36 UTC (permalink / raw)
  To: Sylvain COUTANT; +Cc: linux-kernel

On Fri, Aug 13, 2004 at 04:01:35PM +0200, Sylvain COUTANT wrote:
> Gurus,
> 
> I have a problem with one server (DELL, 1 TB RAID5 + RAID0, Bi-Xeon, 8 GB
> RAM)

Which server please?

-- 
Matt Domsch
Sr. Software Engineer, Lead Engineer
Dell Linux Solutions linux.dell.com & www.dell.com/linux
Linux on Dell mailing lists @ http://lists.us.dell.com

^ permalink raw reply	[flat|nested] 15+ messages in thread

* RE: High CPU usage (up to server hang) under heavy I/O load
  2004-08-13 15:36 ` Matt Domsch
@ 2004-08-13 15:46   ` Sylvain COUTANT
  2004-08-14  3:39     ` Tom Sightler
  0 siblings, 1 reply; 15+ messages in thread
From: Sylvain COUTANT @ 2004-08-13 15:46 UTC (permalink / raw)
  To: 'Matt Domsch'; +Cc: linux-kernel

Hello Matt,

> Which server please?

PE 2600 manufactured in June with latest PC BIOS and SCSI (PERC4/DI)
BIOS/Firmware. We also tried downgrading to the previous release (as we have
another PE2600 which runs fine with them) but it didn't do.

Add-ons are :
- Adaptec SCSI controller
- Total 3 Intel Pro 1000 Ethernet cards


About the other PE2600, it has the same hardware configuration, but not
exactly the same usage : more memory allocated to processes and far less I/O
loads. Although I'm not always satisfied with its performances, we didn't
notice something special on it so far.


Regards,
Sylvain.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: High CPU usage (up to server hang) under heavy I/O load
  2004-08-13 14:01 High CPU usage (up to server hang) under heavy I/O load Sylvain COUTANT
  2004-08-13 15:36 ` Matt Domsch
@ 2004-08-13 16:20 ` Marcelo Tosatti
  2004-08-13 17:53   ` Sylvain COUTANT
  2004-08-16 10:11   ` Sylvain COUTANT
  2004-08-13 22:16 ` Alan Cox
  2 siblings, 2 replies; 15+ messages in thread
From: Marcelo Tosatti @ 2004-08-13 16:20 UTC (permalink / raw)
  To: Sylvain COUTANT; +Cc: linux-kernel, riel, andrea


Hi Sylvain, 

On Fri, Aug 13, 2004 at 04:01:35PM +0200, Sylvain COUTANT wrote:

> I have a problem with one server (DELL, 1 TB RAID5 + RAID0, Bi-Xeon, 8 GB
> RAM) which, sometimes, goes mad when the I/O pressure gets too high. We use
> this server as a VMWare server and as a backup server (200 GB are dedicated
> to the backup part). We have run full hardware diags and checked every
> software that runs on the system. We have been able to reproduce the problem
> once without having launched the VMWare server (so I believe this software
> is not responsible for the problem).
> 
> We have tested kernels 2.4.22 and 2.4.26. The server is running under Debian
> Woody.
> 
> The exact problem is the unix load sometimes goes _very high_ under I/O
> activity (mainly disk writes seem to cause this). Even having just two or
> three tar+gz processes running can cause the 15 minutes load average to get
> as high as 20 or 30 (where we would expect it being between 2 and 4). The
> load can go to so high value that we can't do anything anymore on the server
> during some delay (between a few seconds and a few days). Our Friday night
> backups often hang the server until we unplug the power to reboot on monday
> morning. From what we have seen, it can happen that kswapd and kupdated eats
> up to 70% of one CPU each. 

The algorithms used by v2.4's kswapd/kupdate are not the smartest ones. 

> Otherwise, it's very hard to tell what happens
> exactly because when the server is slowing down we have no possible
> monitoring, no log, no alerts, no automatic reboot (must power off/on to
> reboot), no console alert, ... Just, that it's still pingable !!

Not sure about the high system load. It might be whats expected actually.

The thing is, v2.4 is not the best kernel in the world with reference 
to highmem handling. v2.6 is much better improved in that area.

Anyway, the hang is a bug and must be fixed. I'm unable to reproduce 
such hang on a 16GB box, well...

It might be that you are hitting the deadlock which the following patch 
fixes.

Tasks which are not allowed to get into the memory reservations (which are
used by kswapd/kupdate to be able to free more memory) can do so, and the 
machine deadlocks. 

> We didn't found a way to reproduce this behaviour with a specific test case.
> Sometimes the server will be fine during a few days then slow down and hang.
> Sometimes it will hang just a few hours after having been booted.

You're not able to get sysrq output on the console? It will help if you can 
plug a serial cable and use try to get sysrq output (SysRQ+T 
and SysRQ+P). Have you tried the sysrq thing?

> I have spent hours searching for information and found our problem was very
> common in early 2.4.x kernels (virtual memory management) between 2000 and
> 2002 on servers with large RAM. Only recent information I found was some
> patches related to kernel hanging or something like, but symptoms described
> was never exactly the same as ours.

Those were common in 2.4.2x, but after 2.4.22 (which contains a VM merge
from Andrea's tree, with highmem balancing improvements), this should not
happen anymore. Except this one bug found by Rik.

> I tried to play a little with "/proc/sys/vm/*" settings (mainly bdflush) but
> I didn't found any major improvement (perhaps just because I didn't put in
> the good values). What I was trying to do was reducing the amount of memory
> the kernel could use for dirty buffers, thus having more regular flush to
> disks.
> 
> Hopefully, some people here could be of help ;-))

I'm willing to help and track it down.

You want to try this

# This is a BitKeeper generated diff -Nru style patch.
#
# ChangeSet
#   2004/08/10 17:16:39-03:00 riel@redhat.com 
#   [PATCH] reserved buffers only for PF_MEMALLOC
#   
#   The buffer allocation path in 2.4 has a long standing bug,
#   where non-PF_MEMALLOC tasks can dig into the reserved pool
#   in get_unused_buffer_head().  The following patch makes the
#   reserved pool only accessible to PF_MEMALLOC tasks.
#   
#   Other processes will loop in create_buffers() - the only
#   function that calls get_unused_buffer_head() - and will call
#   try_to_free_pages(GFP_NOIO), freeing any buffer heads that
#   have become freeable due to IO completion.
#   
#   Note that PF_MEMALLOC tasks will NOT do anything inside
#   try_to_free_pages(), so it is needed that they are able to
#   dig into the reserved buffer heads while other tasks are
#   not.
#   
#   Signed-off-by:  Rik van Riel <riel@redhat.com>
# 
# fs/buffer.c
#   2004/08/10 12:34:54-03:00 riel@redhat.com +2 -1
#   reserved buffers only for PF_MEMALLOC
# 
diff -Nru a/fs/buffer.c b/fs/buffer.c
--- a/fs/buffer.c	2004-08-13 10:13:04 -07:00
+++ b/fs/buffer.c	2004-08-13 10:13:04 -07:00
@@ -1260,8 +1260,9 @@
 
 	/*
 	 * If we need an async buffer, use the reserved buffer heads.
+	 * Non-PF_MEMALLOC tasks can just loop in create_buffers().
 	 */
-	if (async) {
+	if (async && (current->flags & PF_MEMALLOC)) {
 		spin_lock(&unused_list_lock);
 		if (unused_list) {
 			bh = unused_list;


^ permalink raw reply	[flat|nested] 15+ messages in thread

* RE: High CPU usage (up to server hang) under heavy I/O load
  2004-08-13 16:20 ` Marcelo Tosatti
@ 2004-08-13 17:53   ` Sylvain COUTANT
  2004-08-16 10:11   ` Sylvain COUTANT
  1 sibling, 0 replies; 15+ messages in thread
From: Sylvain COUTANT @ 2004-08-13 17:53 UTC (permalink / raw)
  To: 'Marcelo Tosatti'; +Cc: linux-kernel, riel, andrea

Hello Marcello,

> v2.6 is much better improved in that area.

Unfortunately, I'm stuck with Debian woody release for now and testing v2.6
could be a pain for us. I'll check again what I can do for this ...


> It might be that you are hitting the deadlock which the following patch
> fixes.

I saw it in a previous thread but was not sure it was related to my problem.
I was on my way to test it anyway !


> You're not able to get sysrq output on the console? It will help if you
> can
> plug a serial cable and use try to get sysrq output (SysRQ+T
> and SysRQ+P). Have you tried the sysrq thing?

When the server was hung, we were not able to get anything, but I'll try
again (just to check we were using the good keystrokes ;-)


> I'm willing to help and track it down.

Thanks.

> You want to try this
> ...[snip]...

I'll let you know asap. I don't think I'll be able to reboot the server from
home this weekend. At least, I'll prepare a new kernel with the patch and
install it on Monday morning.

Regards,
Sylvain.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: High CPU usage (up to server hang) under heavy I/O load
  2004-08-13 14:01 High CPU usage (up to server hang) under heavy I/O load Sylvain COUTANT
  2004-08-13 15:36 ` Matt Domsch
  2004-08-13 16:20 ` Marcelo Tosatti
@ 2004-08-13 22:16 ` Alan Cox
  2004-08-16  9:13   ` Mark Watts
  2 siblings, 1 reply; 15+ messages in thread
From: Alan Cox @ 2004-08-13 22:16 UTC (permalink / raw)
  To: Sylvain COUTANT; +Cc: Linux Kernel Mailing List

On Gwe, 2004-08-13 at 15:01, Sylvain COUTANT wrote:
> I have a problem with one server (DELL, 1 TB RAID5 + RAID0, Bi-Xeon, 8 GB
> RAM) which, sometimes, goes mad when the I/O pressure gets too high. We use
> this server as a VMWare server and as a backup server (200 GB are dedicated
> to the backup part). We have run full hardware diags and checked every
> software that runs on the system. We have been able to reproduce the problem
> once without having launched the VMWare server (so I believe this software
> is not responsible for the problem).
> 
> We have tested kernels 2.4.22 and 2.4.26. The server is running under Debian
> Woody.

Is your raid controller 64bit capable ? If you can I'd also go to a 2.6
kernel for anything > 1Gb, and definitely > 4Gb of RAM. The differences
are astounding although if your PCI I/O hardware cant do 64bit access
your box will suck whatever kernel 8)


^ permalink raw reply	[flat|nested] 15+ messages in thread

* RE: High CPU usage (up to server hang) under heavy I/O load
  2004-08-13 15:46   ` Sylvain COUTANT
@ 2004-08-14  3:39     ` Tom Sightler
  2004-08-15  9:11       ` Lenar Lõhmus
  2004-08-15 20:30       ` Sylvain COUTANT
  0 siblings, 2 replies; 15+ messages in thread
From: Tom Sightler @ 2004-08-14  3:39 UTC (permalink / raw)
  To: Sylvain COUTANT; +Cc: 'Matt Domsch', Linux-Kernel

On Fri, 2004-08-13 at 11:46, Sylvain COUTANT wrote:
> Hello Matt,
> 
> > Which server please?
> 
> PE 2600 manufactured in June with latest PC BIOS and SCSI (PERC4/DI)
> BIOS/Firmware. We also tried downgrading to the previous release (as we have
> another PE2600 which runs fine with them) but it didn't do.

What driver are you using for the PERC?  We have a Dell 1750 with a
PERC/4Di which uses the megaraid driver under RHEL 3 and it has this
same problem, but only when writing to drives connected to PERC
controller.  The system also has a Qlogic 2312 card connected to a EMC
CX400 storage controller and performance to this device is fine, even if
I setup and LUN on a single ATA disk.

During heavy writes to the drives attached to the PERC4/Di the system
becomes practically unusable.  I've been wanting to try the 'megaraid2'
driver to see if this gets rid of the issue but I haven't been able to
try that yet.

We have some older systems with PERC2/DC cards which also use the
'megaraid' driver but they don't seem to experience this issue so I'm a
little suspicious that perhaps this driver simply doesn't work that well
with the newer megaraid-like controllers.

Later,
Tom



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: High CPU usage (up to server hang) under heavy I/O load
  2004-08-14  3:39     ` Tom Sightler
@ 2004-08-15  9:11       ` Lenar Lõhmus
  2004-08-15 20:30       ` Sylvain COUTANT
  1 sibling, 0 replies; 15+ messages in thread
From: Lenar Lõhmus @ 2004-08-15  9:11 UTC (permalink / raw)
  To: Tom Sightler; +Cc: Sylvain COUTANT, 'Matt Domsch', Linux-Kernel

Tom Sightler wrote:

>On Fri, 2004-08-13 at 11:46, Sylvain COUTANT wrote:
>  
>
>During heavy writes to the drives attached to the PERC4/Di the system
>becomes practically unusable.  I've been wanting to try the 'megaraid2'
>driver to see if this gets rid of the issue but I haven't been able to
>try that yet.
>  
>
Can confirm same symptoms. But I can say that megaraid2 driver gets rid 
of this slowdown
(during heavy writing). Only problem you can't use dellmgr or megamon 
with this driver it seems.

>We have some older systems with PERC2/DC cards which also use the
>'megaraid' driver but they don't seem to experience this issue so I'm a
>little suspicious that perhaps this driver simply doesn't work that well
>with the newer megaraid-like controllers.
>  
>
It's PERC3/QC here.

Lenar


^ permalink raw reply	[flat|nested] 15+ messages in thread

* RE: High CPU usage (up to server hang) under heavy I/O load
  2004-08-14  3:39     ` Tom Sightler
  2004-08-15  9:11       ` Lenar Lõhmus
@ 2004-08-15 20:30       ` Sylvain COUTANT
  2004-08-16 12:43         ` Matt Domsch
  1 sibling, 1 reply; 15+ messages in thread
From: Sylvain COUTANT @ 2004-08-15 20:30 UTC (permalink / raw)
  To: 'Tom Sightler'; +Cc: 'Matt Domsch', 'Linux-Kernel'

> What driver are you using for the PERC?

Megaraid "2". As far as I know, in newer kernels, this is the only one
compiled in. We have had several problems with the previous one. Also, Matt
Domsch's page state the exact versions you need depending on your system.

Our 1750's work very well using RAID PERC4/DI under 2.4.26's megaraid
driver. Although they are not really stressed ...


> I've been wanting to try the 'megaraid2' driver

Hopefully, Matt will send his advice on the subject, but I think you should
try this asap.

Sylvain.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: High CPU usage (up to server hang) under heavy I/O load
  2004-08-13 22:16 ` Alan Cox
@ 2004-08-16  9:13   ` Mark Watts
  2004-08-16 10:57     ` Alan Cox
  0 siblings, 1 reply; 15+ messages in thread
From: Mark Watts @ 2004-08-16  9:13 UTC (permalink / raw)
  To: linux-kernel; +Cc: Alan Cox

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


> Is your raid controller 64bit capable ? If you can I'd also go to a 2.6
> kernel for anything > 1Gb, and definitely > 4Gb of RAM. The differences
> are astounding although if your PCI I/O hardware cant do 64bit access
> your box will suck whatever kernel 8)

Would this also mean that if I stick a 64bit SATA raid card (a 3Ware 8506-4LP 
in this case) into a 32bit pci slot, then I/O is always going to suck badly?

... cos I do, and I/O sucks :)

Mark.

- -- 
Mark Watts
Senior Systems Engineer
QinetiQ Trusted Information Management
Trusted Solutions and Services group
GPG Public Key ID: 455420ED

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.4 (GNU/Linux)

iD8DBQFBIHqpBn4EFUVUIO0RAr4HAKCXDnm8YM2f7jY9awix/0KVyoUvYwCeJdmU
+gatlxR+IHurHnTPXDDITqk=
=2Rrq
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 15+ messages in thread

* RE: High CPU usage (up to server hang) under heavy I/O load
  2004-08-13 16:20 ` Marcelo Tosatti
  2004-08-13 17:53   ` Sylvain COUTANT
@ 2004-08-16 10:11   ` Sylvain COUTANT
  2004-08-16 11:49     ` Sylvain COUTANT
  1 sibling, 1 reply; 15+ messages in thread
From: Sylvain COUTANT @ 2004-08-16 10:11 UTC (permalink / raw)
  To: 'Marcelo Tosatti'; +Cc: linux-kernel, riel, andrea

Hi Marcello,

> You want to try this

The server now runs 2.4.26 with the patch applied for about two hours. I
have triggered backups so that it is a little bit stressed.

My first feeling is something changed. Once the whole physical memory has
been in use by the kernel, I saw some load problems rising (as before), but
the server did not hang (as before ;-) and system load has gone down
smoothly (took about one or two minutes).

Now it looks stable under medium I/O load. I'll give it more stress next
night and I'll report the behaviour here.

However, kswapd is still a major CPU eater : 5 minutes of CPU time consumed
since the reboot (2 hours). kupdated is at 1 minute and bdflush is 12
seconds. /proc/sys/vm are boot time standard settings with no change. Actual
system load is near 4 for 15 minutes average, which I consider very bad
result regarding currently running application. I believe I should be near 1
...


Do you think I could achieve better results (smoother operations) by
tweaking those /proc/sys/vm settings ?


Regards,
Sylvain.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: High CPU usage (up to server hang) under heavy I/O load
  2004-08-16  9:13   ` Mark Watts
@ 2004-08-16 10:57     ` Alan Cox
  2004-08-16 12:10       ` Mark Watts
  0 siblings, 1 reply; 15+ messages in thread
From: Alan Cox @ 2004-08-16 10:57 UTC (permalink / raw)
  To: Mark Watts; +Cc: Linux Kernel Mailing List

On Llu, 2004-08-16 at 10:13, Mark Watts wrote:
> Would this also mean that if I stick a 64bit SATA raid card (a 3Ware 8506-4LP 
> in this case) into a 32bit pci slot, then I/O is always going to suck badly?
> 
> ... cos I do, and I/O sucks :)

Seperate issue.

64bit DMA is 64bit addressing (ie can DMA from above 4GB). 64bit wide
slot is double the speed for transfers. The 3ware 8xxx I thought could
do 64bit addressing although the driver seems to indicate it cannot,
so with over 4Gb it would hurt.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* RE: High CPU usage (up to server hang) under heavy I/O load
  2004-08-16 10:11   ` Sylvain COUTANT
@ 2004-08-16 11:49     ` Sylvain COUTANT
  0 siblings, 0 replies; 15+ messages in thread
From: Sylvain COUTANT @ 2004-08-16 11:49 UTC (permalink / raw)
  To: 'Marcelo Tosatti'; +Cc: linux-kernel, riel, andrea

Marcello,

Unfortunately, the system has just frozen again a few minutes ago.

I had a top console opened I was looking at when it happened. Kswapd began
to go mad, was eating up to 100% of one CPU. The 1 minute load has gone to
17/18. Then the system was completely frozen.

I was able to reset and gather some /proc/profile informations during the
freeze :

12752 total                                      0.0085
  3415 .text.lock.filemap                         4.4993
  2436 shrink_cache                               2.3244
  1864 .text.lock.vmscan                          8.0345
  1709 .text.lock.inode                           2.5584
   959 prune_icache                               1.8163
   619 .text.lock.swap                           10.6724
   431 inode_has_buffers                          8.2885
   406 invalidate_inode_pages                     2.4167
   220 swap_out                                   0.1746
   159 __wake_up                                  0.8112
   157 statm_pgd_range                            0.3043
   103 .text.lock.ioctl                           2.4524
    22 try_to_free_buffers                        0.0671
    19 schedule                                   0.0145
    14 default_idle                               0.2692
    14 .text.lock.namei                           0.0112
    12 .text.lock.sched                           0.0253
    12 .text.lock.memory                          0.0504
    12 .text.lock.dquot                           0.0408
    12 .text.lock.buffer                          0.0189
     9 fget                                       0.1250
     7 __alloc_pages                              0.0107
     7 .text.lock.super                           0.1228
     6 unlock_page                                0.0577
     6 unix_poll                                  0.0405
     6 rmqueue                                    0.0102
     6 pipe_poll                                  0.0600
     6 __free_pages_ok                            0.0086
     5 sys_poll                                   0.0066
     5 sock_poll                                  0.1250
     5 __free_pages                               0.1562
     4 get_pid                                    0.0101
     4 do_poll                                    0.0182
     4 do_page_fault                              0.0030
     4 __pollwait                                 0.0278
     4 __generic_copy_to_user                     0.0556
     4 .text.lock.fcntl                           0.0317
     3 zap_page_range                             0.0027
     3 try_to_release_page                        0.0417
     3 system_call                                0.0536
     3 poll_freewait                              0.0441
     3 fput                                       0.0123
     3 do_pollfd                                  0.0221
     2 handle_IRQ_event                           0.0147
     2 do_gettimeofday                            0.0161
     2 balance_dirty_state                        0.0250
     2 atomic_dec_and_lock                        0.0278
     2 alloc_inode                                0.0067
     1 tcp_rcv_established                        0.0005

10 minutes later, I saw my terminal back to life for a few seconds. The load
was around 75 (for one minute avg). I was again able to capture some profile
informations :

230650 total                                      0.1529
 97605 .text.lock.filemap                       128.5968
 42396 .text.lock.inode                          63.4671
 36755 shrink_cache                              35.0716
 20108 prune_icache                              38.0833
  9117 inode_has_buffers                        175.3269
  8156 invalidate_inode_pages                    48.5476
  7298 .text.lock.vmscan                         31.4569
  3500 __wake_up                                 17.8571
  1618 swap_out                                   1.2841
  1221 default_idle                              23.4808
  1065 .text.lock.swap                           18.3621
   334 try_to_free_buffers                        1.0183
   237 statm_pgd_range                            0.4593
   160 .text.lock.ioctl                           3.8095
   145 .text.lock.buffer                          0.2287
   116 unlock_page                                1.1154
    68 do_softirq                                 0.3091
    62 schedule                                   0.0473
    48 __free_pages                               1.5000
    35 .text.lock.dquot                           0.1190
    29 fget                                       0.4028
    27 .text.lock.sched                           0.0570
    23 __free_pages_ok                            0.0329
    22 nr_free_buffer_pages                       0.1833
    19 .text.lock.namei                           0.0152
    16 sock_poll                                  0.4000
    15 system_call                                0.2679
    15 rmqueue                                    0.0255
    15 .text.lock.super                           0.2632
    14 unix_poll                                  0.0946
    14 pipe_poll                                  0.1400
    13 fput                                       0.0533
    13 .text.lock.memory                          0.0546
    12 do_flushpage                               0.2727
    12 balance_dirty_state                        0.1500
    11 try_to_release_page                        0.1528
    11 megaraid_isr_memmapped                     0.1447
    11 get_pid                                    0.0278
    11 do_pollfd                                  0.0809
    10 __alloc_pages                              0.0153
     9 __generic_copy_to_user                     0.1250
     8 sys_poll                                   0.0106
     8 __wake_up_sync                             0.0317
     8 __pollwait                                 0.0556
     7 timer_bh                                   0.0072
     7 poll_freewait                              0.1029
     7 invalidate_bdev                            0.0188
     7 do_poll                                    0.0318
     7 d_lookup                                   0.0246

This just lasted a few seconds before the terminal freeze again.

And so on ... I think I'll have to reboot soon ...

Hopefully, this will help.


Regards,
Sylvain.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: High CPU usage (up to server hang) under heavy I/O load
  2004-08-16 10:57     ` Alan Cox
@ 2004-08-16 12:10       ` Mark Watts
  0 siblings, 0 replies; 15+ messages in thread
From: Mark Watts @ 2004-08-16 12:10 UTC (permalink / raw)
  To: Alan Cox; +Cc: Linux Kernel Mailing List

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


> On Llu, 2004-08-16 at 10:13, Mark Watts wrote:
> > Would this also mean that if I stick a 64bit SATA raid card (a 3Ware
> > 8506-4LP in this case) into a 32bit pci slot, then I/O is always going to
> > suck badly?
> >
> > ... cos I do, and I/O sucks :)
>
> Seperate issue.
>
> 64bit DMA is 64bit addressing (ie can DMA from above 4GB). 64bit wide
> slot is double the speed for transfers. The 3ware 8xxx I thought could
> do 64bit addressing although the driver seems to indicate it cannot,
> so with over 4Gb it would hurt.

We're running Dual Opterons (Tyan S2875 boards) with 2GB ram.
These boards only have 32bit pci slots so we're already not using the full 
potential of the 3Ware.
Basically write performance is a major bottleneck (hardware raid-5 with 250GB 
sata drives) and writing anything over a few meg usually causes the machine 
to stall while the write occurs.

- -- 
Mark Watts
Senior Systems Engineer
QinetiQ Trusted Information Management
Trusted Solutions and Services group
GPG Public Key ID: 455420ED

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.4 (GNU/Linux)

iD8DBQFBIKQmBn4EFUVUIO0RAjZ3AKDctamutj48XLKCMbY4FSfwgjtXEwCdGHPX
bvvd3PNk3SlilysaOYtwa6Y=
=rOlr
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: High CPU usage (up to server hang) under heavy I/O load
  2004-08-15 20:30       ` Sylvain COUTANT
@ 2004-08-16 12:43         ` Matt Domsch
  0 siblings, 0 replies; 15+ messages in thread
From: Matt Domsch @ 2004-08-16 12:43 UTC (permalink / raw)
  To: Sylvain COUTANT; +Cc: 'Tom Sightler', 'Linux-Kernel'

On Sun, Aug 15, 2004 at 10:30:57PM +0200, Sylvain COUTANT wrote:
> > What driver are you using for the PERC?
> 
> Megaraid "2". As far as I know, in newer kernels, this is the only one
> compiled in. We have had several problems with the previous one. Also, Matt
> Domsch's page state the exact versions you need depending on your system.
> 
> Our 1750's work very well using RAID PERC4/DI under 2.4.26's megaraid
> driver. Although they are not really stressed ...
> 
> > I've been wanting to try the 'megaraid2' driver
> 
> Hopefully, Matt will send his advice on the subject, but I think you should
> try this asap.

All PERC4-series adapters should really use 'megaraid2' in 2.4.x, and
as far as we know, megaraid2 handles all previous LSI-based adapters
just fine too.  In 2.6.x, there is only one driver 'megaraid', which
is the megaraid2 code base - no need for an older driver there.

Thanks,
Matt

-- 
Matt Domsch
Sr. Software Engineer, Lead Engineer
Dell Linux Solutions linux.dell.com & www.dell.com/linux
Linux on Dell mailing lists @ http://lists.us.dell.com

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2004-08-16 12:44 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-08-13 14:01 High CPU usage (up to server hang) under heavy I/O load Sylvain COUTANT
2004-08-13 15:36 ` Matt Domsch
2004-08-13 15:46   ` Sylvain COUTANT
2004-08-14  3:39     ` Tom Sightler
2004-08-15  9:11       ` Lenar Lõhmus
2004-08-15 20:30       ` Sylvain COUTANT
2004-08-16 12:43         ` Matt Domsch
2004-08-13 16:20 ` Marcelo Tosatti
2004-08-13 17:53   ` Sylvain COUTANT
2004-08-16 10:11   ` Sylvain COUTANT
2004-08-16 11:49     ` Sylvain COUTANT
2004-08-13 22:16 ` Alan Cox
2004-08-16  9:13   ` Mark Watts
2004-08-16 10:57     ` Alan Cox
2004-08-16 12:10       ` Mark Watts

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox