* kernel go-slow
@ 2003-02-02 23:27 Russell Coker
2003-02-02 23:42 ` Rudy L. Zijlstra
` (2 more replies)
0 siblings, 3 replies; 8+ messages in thread
From: Russell Coker @ 2003-02-02 23:27 UTC (permalink / raw)
To: ReiserFS
I'm running a number of machines with 2.4.20 and the ReiserFS journal patches.
One problem that has started occuring is that periodically some of the
machines will go really slow for a while. It's as if the CPU speed has just
dropped to 1% of it's regular speed. Then after 10 minutes or so it will
continue as normal.
Has anyone heard of such things before?
I am asking here first because the ReiserFS patch is the most significant
kernel patch I've applied on what is otherwise a stock 2.4.20 kernel.
Interestingly the machines that have the problems are not the most active in
the file system (mail store), but the mail spool machines. The mail spool
machines do a good amount of file access (but well below the limits of the
hardware) and also use more memory and have large load spikes on occasion
(virus and spam scanning).
--
http://www.coker.com.au/selinux/ My NSA Security Enhanced Linux packages
http://www.coker.com.au/bonnie++/ Bonnie++ hard drive benchmark
http://www.coker.com.au/postal/ Postal SMTP/POP benchmark
http://www.coker.com.au/~russell/ My home page
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: kernel go-slow
2003-02-02 23:27 kernel go-slow Russell Coker
@ 2003-02-02 23:42 ` Rudy L. Zijlstra
2003-02-03 4:53 ` Ookhoi
2003-02-06 11:26 ` Alexander Lyamin
2 siblings, 0 replies; 8+ messages in thread
From: Rudy L. Zijlstra @ 2003-02-02 23:42 UTC (permalink / raw)
To: Russell Coker; +Cc: ReiserFS
Russell Coker wrote:
>I'm running a number of machines with 2.4.20 and the ReiserFS journal patches.
>
>One problem that has started occuring is that periodically some of the
>machines will go really slow for a while. It's as if the CPU speed has just
>dropped to 1% of it's regular speed. Then after 10 minutes or so it will
>continue as normal.
>
>Has anyone heard of such things before?
>
>
>
Russell,
I am (was) running a vanilla 2.4.20 kernel and experienced a slow-down
each night during virus scan. System would not respond to http during
undefined moments. But rather repeatable each night, though each time at
a different moment during the night. I've just rebooted into 2.4.19 to
check whether its 2.4.20 or the results of hardware modification I did 2
weeks ago. System is lightly loaded. file systems in use mostly Reiserfs
and a spattering of left-over ext2.
Cheers,
Rudy
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: kernel go-slow
2003-02-02 23:27 kernel go-slow Russell Coker
2003-02-02 23:42 ` Rudy L. Zijlstra
@ 2003-02-03 4:53 ` Ookhoi
2003-02-06 11:26 ` Alexander Lyamin
2 siblings, 0 replies; 8+ messages in thread
From: Ookhoi @ 2003-02-03 4:53 UTC (permalink / raw)
To: Russell Coker; +Cc: ReiserFS
Russell Coker wrote (ao):
> I'm running a number of machines with 2.4.20 and the ReiserFS journal
> patches.
>
> One problem that has started occuring is that periodically some of the
> machines will go really slow for a while. It's as if the CPU speed has
> just dropped to 1% of it's regular speed. Then after 10 minutes or so
> it will continue as normal.
>
> Has anyone heard of such things before?
It seems there is a 'bug' in 2.4.20 which causes the stall. (don't know
the details, but you're not the only one).
Maybe a -pre fixes it, though in your case I would wait for .21 I think.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: kernel go-slow
2003-02-02 23:27 kernel go-slow Russell Coker
2003-02-02 23:42 ` Rudy L. Zijlstra
2003-02-03 4:53 ` Ookhoi
@ 2003-02-06 11:26 ` Alexander Lyamin
2003-02-06 16:32 ` Alexander Lyamin
2 siblings, 1 reply; 8+ messages in thread
From: Alexander Lyamin @ 2003-02-06 11:26 UTC (permalink / raw)
To: Russell Coker; +Cc: ReiserFS
Mon, Feb 03, 2003 at 12:27:40AM +0100, Russell Coker wrote:
> I'm running a number of machines with 2.4.20 and the ReiserFS journal patches.
>
> One problem that has started occuring is that periodically some of the
> machines will go really slow for a while. It's as if the CPU speed has just
> dropped to 1% of it's regular speed. Then after 10 minutes or so it will
> continue as normal.
when its slows down, please check with vmstat for IO or with your
led for disk activity. thats a simply and stupid.
but theres no really good way to understand whats goining on in kernel
if you are userland yourself. so go in kernel with profiling and see
where does it spend it precisious time. slightly more complicated then
method above, but much more effective.
>
> Has anyone heard of such things before?
>
> I am asking here first because the ReiserFS patch is the most significant
> kernel patch I've applied on what is otherwise a stock 2.4.20 kernel.
>
> Interestingly the machines that have the problems are not the most active in
> the file system (mail store), but the mail spool machines. The mail spool
> machines do a good amount of file access (but well below the limits of the
> hardware) and also use more memory and have large load spikes on occasion
> (virus and spam scanning).
--
"Cache remedies via multi-variable logic shorts will leave you crying."(cl)
Lex Lyamin
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: kernel go-slow
2003-02-06 11:26 ` Alexander Lyamin
@ 2003-02-06 16:32 ` Alexander Lyamin
2003-02-06 16:41 ` Russell Coker
0 siblings, 1 reply; 8+ messages in thread
From: Alexander Lyamin @ 2003-02-06 16:32 UTC (permalink / raw)
To: Alexander Lyamin; +Cc: Russell Coker, ReiserFS
Thu, Feb 06, 2003 at 02:26:49PM +0300, Alexander Lyamin wrote:
> Mon, Feb 03, 2003 at 12:27:40AM +0100, Russell Coker wrote:
> > I'm running a number of machines with 2.4.20 and the ReiserFS journal patches.
> >
> > One problem that has started occuring is that periodically some of the
> > machines will go really slow for a while. It's as if the CPU speed has just
> > dropped to 1% of it's regular speed. Then after 10 minutes or so it will
> > continue as normal.
>
> when its slows down, please check with vmstat for IO or with your
i think i wasnt clear enough.
so - first , if you "go-slow" on a disk activity, chances are good
that it caused by FS or VM or their misunderstandings.
but there is possible situations that will not generate disk activity,
but may cause your system to "go-slow", if there you have some
unussual IO numbers while disk activity is moderate to low -
most likely same sweet pair.
but Oleg Drokin pointed at situations when even IO will not indicate
whats going on :)
so advice is still the same - if you having slowdowns profiling might help
you much better then withchy methods described above.
> led for disk activity. thats a simply and stupid.
>
> but theres no really good way to understand whats goining on in kernel
> if you are userland yourself. so go in kernel with profiling and see
> where does it spend it precisious time. slightly more complicated then
> method above, but much more effective.
>
> >
> > Has anyone heard of such things before?
> >
> > I am asking here first because the ReiserFS patch is the most significant
> > kernel patch I've applied on what is otherwise a stock 2.4.20 kernel.
> >
> > Interestingly the machines that have the problems are not the most active in
> > the file system (mail store), but the mail spool machines. The mail spool
> > machines do a good amount of file access (but well below the limits of the
> > hardware) and also use more memory and have large load spikes on occasion
> > (virus and spam scanning).
talking about virus/spam scanning - what do you use and how its integrated in
your SMTP MTA ?
--
"Cache remedies via multi-variable logic shorts will leave you crying."(cl)
Lex Lyamin
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: kernel go-slow
2003-02-06 16:32 ` Alexander Lyamin
@ 2003-02-06 16:41 ` Russell Coker
2003-02-06 16:48 ` Oleg Drokin
2003-02-06 18:58 ` Hans Reiser
0 siblings, 2 replies; 8+ messages in thread
From: Russell Coker @ 2003-02-06 16:41 UTC (permalink / raw)
To: flx; +Cc: ReiserFS
On Thu, 6 Feb 2003 17:32, Alexander Lyamin wrote:
> > > One problem that has started occuring is that periodically some of the
> > > machines will go really slow for a while. It's as if the CPU speed has
> > > just dropped to 1% of it's regular speed. Then after 10 minutes or so
> > > it will continue as normal.
> >
> > when its slows down, please check with vmstat for IO or with your
>
> i think i wasnt clear enough.
> so - first , if you "go-slow" on a disk activity, chances are good
> that it caused by FS or VM or their misunderstandings.
vmstat doesn't work properly. CPU time is 99% system which suggests that one
CPU is spending all it's time in kernel space (for both threads of a
hyper-threaded CPU) or that both CPUs have each got one thread locked in
kernel space.
It's not disk related, those machines don't have a huge disk access. The
machines with the serious disk activity don't have any problems.
> but there is possible situations that will not generate disk activity,
> but may cause your system to "go-slow", if there you have some
> unussual IO numbers while disk activity is moderate to low -
> most likely same sweet pair.
The problem is that sar etc product jumbled results. Profiling the kernel may
help, but may also hide the error, and it's not something I can easily do.
The servers are locked in a managed server room on the other side of the city
so seeing the blinken lights is not an option.
I've put the aa1 kernel on half the machines and now I'll wait to see what
happens. If the aa1 machines don't have the problem but the others do then
I'll go all aa1.
> > > Interestingly the machines that have the problems are not the most
> > > active in the file system (mail store), but the mail spool machines.
> > > The mail spool machines do a good amount of file access (but well below
> > > the limits of the hardware) and also use more memory and have large
> > > load spikes on occasion (virus and spam scanning).
>
> talking about virus/spam scanning - what do you use and how its integrated
> in your SMTP MTA ?
RAV. I'm not sure of the details, I think it runs as a daemon that qmail
talks to. I try to avoid the anti-virus stuff.
--
http://www.coker.com.au/selinux/ My NSA Security Enhanced Linux packages
http://www.coker.com.au/bonnie++/ Bonnie++ hard drive benchmark
http://www.coker.com.au/postal/ Postal SMTP/POP benchmark
http://www.coker.com.au/~russell/ My home page
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: kernel go-slow
2003-02-06 16:41 ` Russell Coker
@ 2003-02-06 16:48 ` Oleg Drokin
2003-02-06 18:58 ` Hans Reiser
1 sibling, 0 replies; 8+ messages in thread
From: Oleg Drokin @ 2003-02-06 16:48 UTC (permalink / raw)
To: Russell Coker; +Cc: flx, ReiserFS
Hello!
On Thu, Feb 06, 2003 at 05:41:46PM +0100, Russell Coker wrote:
> > but there is possible situations that will not generate disk activity,
> > but may cause your system to "go-slow", if there you have some
> > unussual IO numbers while disk activity is moderate to low -
> > most likely same sweet pair.
> The problem is that sar etc product jumbled results. Profiling the kernel may
> help, but may also hide the error, and it's not something I can easily do.
Well, you can do it very easily.
reboot with "profile=2" kernel option.
when 100% sys cpu situation started - execute readprofile -r
when it is finished, execute readprofile -m /path/to/System.map >somefile
then sort somefile and you are done, you are now seeing where is most of the time
is spent.
> The servers are locked in a managed server room on the other side of the city
> so seeing the blinken lights is not an option.
;)
<humour>webcam</humour>
> I've put the aa1 kernel on half the machines and now I'll wait to see what
> happens. If the aa1 machines don't have the problem but the others do then
> I'll go all aa1.
Ah, if your problem was with highmem I/O not present, then that might actually help.
Bye,
Oleg
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: kernel go-slow
2003-02-06 16:41 ` Russell Coker
2003-02-06 16:48 ` Oleg Drokin
@ 2003-02-06 18:58 ` Hans Reiser
1 sibling, 0 replies; 8+ messages in thread
From: Hans Reiser @ 2003-02-06 18:58 UTC (permalink / raw)
To: Russell Coker; +Cc: flx, ReiserFS
Russell Coker wrote:
>On Thu, 6 Feb 2003 17:32, Alexander Lyamin wrote:
>
>
>>>>One problem that has started occuring is that periodically some of the
>>>>machines will go really slow for a while. It's as if the CPU speed has
>>>>just dropped to 1% of it's regular speed. Then after 10 minutes or so
>>>>it will continue as normal.
>>>>
>>>>
>>>when its slows down, please check with vmstat for IO or with your
>>>
>>>
>>i think i wasnt clear enough.
>>so - first , if you "go-slow" on a disk activity, chances are good
>>that it caused by FS or VM or their misunderstandings.
>>
>>
>
>vmstat doesn't work properly. CPU time is 99% system which suggests that one
>CPU is spending all it's time in kernel space (for both threads of a
>hyper-threaded CPU) or that both CPUs have each got one thread locked in
>kernel space.
>
>
>
I propose that you try reversing the datalogging patch for long enough
to know whether it is our new code that is buggy.
If it is not our code, and it matters enough to justify the cost, we can
remote login kernel analyze for you for an hourly fee. Probably the fee
you charge them is good enough for us too.;-)
--
Hans
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2003-02-06 18:58 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2003-02-02 23:27 kernel go-slow Russell Coker
2003-02-02 23:42 ` Rudy L. Zijlstra
2003-02-03 4:53 ` Ookhoi
2003-02-06 11:26 ` Alexander Lyamin
2003-02-06 16:32 ` Alexander Lyamin
2003-02-06 16:41 ` Russell Coker
2003-02-06 16:48 ` Oleg Drokin
2003-02-06 18:58 ` Hans Reiser
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.