* bad command responsiveness Proliant DL 585
@ 2006-06-14 7:59 David Osojnik
2006-06-14 9:42 ` Mike Galbraith
0 siblings, 1 reply; 8+ messages in thread
From: David Osojnik @ 2006-06-14 7:59 UTC (permalink / raw)
To: linux-kernel
Hello,
We have a problem with four HP Proliant DL 585 servers with 4 AMD
Opteron processors and 16Gb of memory and 3x 300Gb U320 SCSI disks and
with all the latest firmware. we noticed bad command responsiveness in
an production environment and poor performance (web server and mysql)
The problem is good reproducible if creating a large file 30Gb in size with:
time dd if=/dev/zero of=test.file bs=3072 count=10240000
on the root partition no matter which one reiserfs, ext3
what happens is I open three more console windows and do random commands
like: "ls, w, route -n, ifconfig" but the commands freezes for random
time (this time is from 1 minute to 15minutes!! per command execution
time) when the command starts working (after 5minutes) i try it again
and the command freezes again for a random time... the strange thing is
that if one command freezes all other commands freeze too when one
starts to work others work too. (if running top it stops refreshing for
the lockup period)
we tried raid0, raid1, raid5 its the same and we even tried some other
raid controllers other then default onboard Smart Array 5i we tried
Smart Array 6402 and MegaRAID SCSI 320-2X. The problem is still there
the only difference is writing speed but the lockups are still there!
and we tried this on four different DL 585 servers with all the same
problem!
load when creating 32Gb file was around 10 (starting creation of the
file with load 0.05) and the system did not run any services except the
default (after a fresh minimum install of more then one distribution) we
tried it too with all services shutdown and as much modules unloaded as
we could remove.
we tried lots of kernels and cciss modules but all have the problem
including 2.6.16.20
only the kernel 2.4 was working better there were still the lockups but
were less often and commands executed faster around 50%.
why am i writing to kernel list because it looks like a kernel problem
to me since the kernel 2.4 looks much better.
hope someone can help because we are getting really desperate
thanks
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: bad command responsiveness Proliant DL 585
2006-06-14 7:59 bad command responsiveness Proliant DL 585 David Osojnik
@ 2006-06-14 9:42 ` Mike Galbraith
2006-06-14 19:18 ` David Osojnik
0 siblings, 1 reply; 8+ messages in thread
From: Mike Galbraith @ 2006-06-14 9:42 UTC (permalink / raw)
To: david; +Cc: linux-kernel
On Wed, 2006-06-14 at 09:59 +0200, David Osojnik wrote:
> Hello,
>
> We have a problem with four HP Proliant DL 585 servers with 4 AMD
> Opteron processors and 16Gb of memory and 3x 300Gb U320 SCSI disks and
> with all the latest firmware. we noticed bad command responsiveness in
> an production environment and poor performance (web server and mysql)
> The problem is good reproducible if creating a large file 30Gb in size with:
>
> time dd if=/dev/zero of=test.file bs=3072 count=10240000
>
> on the root partition no matter which one reiserfs, ext3
>
> what happens is I open three more console windows and do random commands
> like: "ls, w, route -n, ifconfig" but the commands freezes for random
> time (this time is from 1 minute to 15minutes!! per command execution
> time) when the command starts working (after 5minutes) i try it again
> and the command freezes again for a random time... the strange thing is
> that if one command freezes all other commands freeze too when one
> starts to work others work too. (if running top it stops refreshing for
> the lockup period)
Does top freeze if started from an mlockall(MCL_PRESENT|MCL_FUTURE)
shell running at realtime priority?
Try SysRq-T and SysRq-M during freezes to gather info about VM and task
state during freeze.
-Mike
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: bad command responsiveness Proliant DL 585
2006-06-14 9:42 ` Mike Galbraith
@ 2006-06-14 19:18 ` David Osojnik
2006-06-15 6:44 ` Mike Galbraith
0 siblings, 1 reply; 8+ messages in thread
From: David Osojnik @ 2006-06-14 19:18 UTC (permalink / raw)
To: Mike Galbraith; +Cc: linux-kernel
here is the output of SysRq-T and SysRq-M:
http://www.dworf.com/sysrq.txt
any ideas?
thanks
Mike Galbraith wrote:
> Does top freeze if started from an mlockall(MCL_PRESENT|MCL_FUTURE)
> shell running at realtime priority?
>
> Try SysRq-T and SysRq-M during freezes to gather info about VM and task
> state during freeze.
>
> -Mike
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: bad command responsiveness Proliant DL 585
2006-06-14 19:18 ` David Osojnik
@ 2006-06-15 6:44 ` Mike Galbraith
2006-06-15 7:38 ` David Osojnik
[not found] ` <44910E5B.50704@dworf.com>
0 siblings, 2 replies; 8+ messages in thread
From: Mike Galbraith @ 2006-06-15 6:44 UTC (permalink / raw)
To: david; +Cc: linux-kernel
On Wed, 2006-06-14 at 21:18 +0200, David Osojnik wrote:
> here is the output of SysRq-T and SysRq-M:
>
> http://www.dworf.com/sysrq.txt
>
> any ideas?
Not really.
I see I/O jammed up on reiserfs:.text.lock.journal, but you said
reiserfs and ext3 both stall the same way. If the journal is in the
raid, I'd try moving it, but I can't really imagine seek troubles
leading to 15 minutes of grinding. I noticed that those last two
instances of bash got nailed because of atime. Do things get any better
if mounted noatime, nodiratime?
-Mike
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: bad command responsiveness Proliant DL 585
2006-06-15 6:44 ` Mike Galbraith
@ 2006-06-15 7:38 ` David Osojnik
[not found] ` <44910E5B.50704@dworf.com>
1 sibling, 0 replies; 8+ messages in thread
From: David Osojnik @ 2006-06-15 7:38 UTC (permalink / raw)
To: Mike Galbraith; +Cc: linux-kernel
Hello,
IT Works perfect when setting noatime,nodiratime on the partition!!
can i try anything else? what does this actually mean?
thanks!!
David
Mike Galbraith wrote:
>On Wed, 2006-06-14 at 21:18 +0200, David Osojnik wrote:
>
>
>>here is the output of SysRq-T and SysRq-M:
>>
>>http://www.dworf.com/sysrq.txt
>>
>>any ideas?
>>
>>
>
>Not really.
>
>I see I/O jammed up on reiserfs:.text.lock.journal, but you said
>reiserfs and ext3 both stall the same way. If the journal is in the
>raid, I'd try moving it, but I can't really imagine seek troubles
>leading to 15 minutes of grinding. I noticed that those last two
>instances of bash got nailed because of atime. Do things get any better
>if mounted noatime, nodiratime?
>
> -Mike
>
>-
>To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>the body of a message to majordomo@vger.kernel.org
>More majordomo info at http://vger.kernel.org/majordomo-info.html
>Please read the FAQ at http://www.tux.org/lkml/
>
>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: bad command responsiveness Proliant DL 585
[not found] ` <44910E5B.50704@dworf.com>
@ 2006-06-15 8:00 ` Mike Galbraith
2006-06-15 11:18 ` David Osojnik
0 siblings, 1 reply; 8+ messages in thread
From: Mike Galbraith @ 2006-06-15 8:00 UTC (permalink / raw)
To: david; +Cc: linux-kernel
On Thu, 2006-06-15 at 09:38 +0200, David Osojnik wrote:
> Hello,
>
> IT Works perfect when setting noatime,nodiratime on the partition!!
That's good to hear... sort of.
>
> can i try anything else? what does this actually mean?
Besides having a constipated journal sucks rocks? ;-) Dunno. You could
try a different elevator as a shot in the dark - eliminate something.
-Mike
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: bad command responsiveness Proliant DL 585
2006-06-15 8:00 ` Mike Galbraith
@ 2006-06-15 11:18 ` David Osojnik
2006-06-15 12:14 ` Mike Galbraith
0 siblings, 1 reply; 8+ messages in thread
From: David Osojnik @ 2006-06-15 11:18 UTC (permalink / raw)
To: Mike Galbraith; +Cc: linux-kernel
Well i tired cfq,anticipatory,deadline,no-op schedulers/elevators with
atime but none worked the only difference is when I use noatime and
nodiratime
could this be an kernel problem?
David
Mike Galbraith wrote:
>On Thu, 2006-06-15 at 09:38 +0200, David Osojnik wrote:
>
>
>>Hello,
>>
>>IT Works perfect when setting noatime,nodiratime on the partition!!
>>
>>
>
>That's good to hear... sort of.
>
>
>>can i try anything else? what does this actually mean?
>>
>>
>
>Besides having a constipated journal sucks rocks? ;-) Dunno. You could
>try a different elevator as a shot in the dark - eliminate something.
>
> -Mike
>
>-
>To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>the body of a message to majordomo@vger.kernel.org
>More majordomo info at http://vger.kernel.org/majordomo-info.html
>Please read the FAQ at http://www.tux.org/lkml/
>
>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: bad command responsiveness Proliant DL 585
2006-06-15 11:18 ` David Osojnik
@ 2006-06-15 12:14 ` Mike Galbraith
0 siblings, 0 replies; 8+ messages in thread
From: Mike Galbraith @ 2006-06-15 12:14 UTC (permalink / raw)
To: david; +Cc: linux-kernel
On Thu, 2006-06-15 at 13:18 +0200, David Osojnik wrote:
> Well i tired cfq,anticipatory,deadline,no-op schedulers/elevators with
> atime but none worked the only difference is when I use noatime and
> nodiratime
That's the result I expected.
> could this be an kernel problem?
Sure seems like one to me. Perhaps someone with a good understanding of
journaling fs will comment. I can only speculate.
-Mike
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2006-06-15 12:10 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-06-14 7:59 bad command responsiveness Proliant DL 585 David Osojnik
2006-06-14 9:42 ` Mike Galbraith
2006-06-14 19:18 ` David Osojnik
2006-06-15 6:44 ` Mike Galbraith
2006-06-15 7:38 ` David Osojnik
[not found] ` <44910E5B.50704@dworf.com>
2006-06-15 8:00 ` Mike Galbraith
2006-06-15 11:18 ` David Osojnik
2006-06-15 12:14 ` Mike Galbraith
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox