public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* 2.2.19pre3 and poor reponse to RT-scheduled processes?
@ 2000-12-29 20:45 Rafal Boni
  2000-12-29 21:19 ` Gregory Maxwell
  0 siblings, 1 reply; 7+ messages in thread
From: Rafal Boni @ 2000-12-29 20:45 UTC (permalink / raw)
  To: linux-mm, linux-kernel

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Content-Type: text/plain; charset=us-ascii

[...Please CC me on any replies, as I'm not on the list(s)...]

Folks:
	I was experiencing problems with 2.2.16 where the box would go out
	to lunch for a few seconds flushing buffer or paging at inopportune
	times (is there ever an opportune time for the box to become non-
	reponsive for 5 seconds? 8-).

	2.2.19pre3 makes the behaviour much better, but I still see ~ 2sec
	pauses at times.  I'm sending this to the MM list as well, since I
	believe the poor behaviour in 2.2.16 was an MM issue... I don't 
	know where the slowdowns are happening this time around.

	The box in question is running the linux-ha.org heartbeat package,
	which is a RT-scheduled, mlock()'ed process, and as such should
	get as good service as the box is able to mange.  Often, under
	high disk (and/or MM) loads, the box becomes unreponsive for a
	period of time from ~ 1 sec to a high of ~ 2.8sec.

	The test is simply running a 'dd if=/dev/zero of=/u1/big-empty-file
	bs=1k count=512000 && date'.  Generally, the box will sieze up around
	the same time as the the 'dd' finishes (maybe trying to exec date?).

	I'd appreciate any hints at how to reduce the non-reponsiveness 
	window down as much as possible.  I haven't yet looked to see if
	there is a version of the low-latency patches for 2.2.18 or 19pre,
	but I'd appreciate other ideas on tracking this down as well.

Thanks!
- --rafal

- ----
Rafal Boni                                               rafal.boni@eDial.com
 PGP key C7D3024C, print EA49 160D F5E4 C46A 9E91  524E 11E0 7133 C7D3 024C
    Need to get a hold of me?  http://800.eDial.com/rafal.boni@eDial.com

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.0.0 (GNU/Linux)
Comment: Exmh version 2.1.1 10/15/1999

iD8DBQE6TPfjEeBxM8fTAkwRAiPaAKDSp1udFSypqq838fwAjQnlFW0m2wCgtycm
xF7xuBroSl3YXCTqUXGDAy0=
=JHLL
-----END PGP SIGNATURE-----

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: 2.2.19pre3 and poor reponse to RT-scheduled processes?
  2000-12-29 20:45 2.2.19pre3 and poor reponse to RT-scheduled processes? Rafal Boni
@ 2000-12-29 21:19 ` Gregory Maxwell
  2000-12-29 21:54   ` Rafal Boni
  0 siblings, 1 reply; 7+ messages in thread
From: Gregory Maxwell @ 2000-12-29 21:19 UTC (permalink / raw)
  To: Rafal Boni; +Cc: linux-mm, linux-kernel

On Fri, Dec 29, 2000 at 03:45:23PM -0500, Rafal Boni wrote:
[snip]
> 	The box in question is running the linux-ha.org heartbeat package,
> 	which is a RT-scheduled, mlock()'ed process, and as such should
> 	get as good service as the box is able to mange.  Often, under
> 	high disk (and/or MM) loads, the box becomes unreponsive for a
> 	period of time from ~ 1 sec to a high of ~ 2.8sec.
[snip]

You are running IDE aren't you?

Enable DMA and/or unmask interupts.

man hdparm

Good luck.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: 2.2.19pre3 and poor reponse to RT-scheduled processes?
  2000-12-29 21:19 ` Gregory Maxwell
@ 2000-12-29 21:54   ` Rafal Boni
  2000-12-30 18:16     ` Andrea Arcangeli
  0 siblings, 1 reply; 7+ messages in thread
From: Rafal Boni @ 2000-12-29 21:54 UTC (permalink / raw)
  Cc: linux-mm, linux-kernel

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Content-Type: text/plain; charset=us-ascii

In message <20001229161927.A560@xi.linuxpower.cx>, Greg Maxwell wrote:

- -> You are running IDE aren't you?
- -> 
- -> Enable DMA and/or unmask interupts.

D'oh!  Thanks to Greg for the clue-by-four!  I *am* running IDE and I had
both DMA (due to misreading of kernel boot message) and interrupt unmasking 
(since I had forgotten that one) off....

I had assumed that DMA was on from the mention of it in kernel messages 
(which on closer reading do indicate CMOS/BIOS configured default modes,
not what the kernel is using), and the lack of an explicit message on
the order of "I know it's there, but I'm not going to use it all the
same" 8-)

Now my box behaves much more reasonably... I'll just have to beat harder
on it and see what happens.

Thank for the help,
- --rafal

- ----
Rafal Boni                                              rafal.boni@eDial.com
 PGP key C7D3024C, print EA49 160D F5E4 C46A 9E91  524E 11E0 7133 C7D3 024C
    Need to get a hold of me?  http://800.edial.com/rafal.boni@eDial.com

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.0.0 (GNU/Linux)
Comment: Exmh version 2.1.1 10/15/1999

iD8DBQE6TQgOEeBxM8fTAkwRArCFAKDVrzaWxGtRFR0pbyNwvIF20bOSiwCfdhg9
wK1ZAhaCfK5qcrQezDECiK4=
=9x6E
-----END PGP SIGNATURE-----

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: 2.2.19pre3 and poor reponse to RT-scheduled processes?
  2000-12-29 21:54   ` Rafal Boni
@ 2000-12-30 18:16     ` Andrea Arcangeli
  2000-12-30 19:09       ` Linus Torvalds
  0 siblings, 1 reply; 7+ messages in thread
From: Andrea Arcangeli @ 2000-12-30 18:16 UTC (permalink / raw)
  To: Rafal Boni; +Cc: linux-mm, linux-kernel, Gregory Maxwell

On Fri, Dec 29, 2000 at 04:54:23PM -0500, Rafal Boni wrote:
> Now my box behaves much more reasonably... I'll just have to beat harder
> on it and see what happens.

Another thing: while writing to disk if you want low latency readers you can
do:

	elvtune -r 1 /dev/hd[abcd]

The 1/2 seconds stalls you see could be just because of applications that waits
I/O synchronously while the elevator is reodering I/O requests (and even if the
elevator wouldn't reorder anything the new requests would go to the end of the
I/O queue so they would have some higher latency anyways). That's normal and if
it's the case to avoid those stalls you can only decrease the I/O load or
increase disk throughput ;). The important thing is that the kernel is
not sitting in a tight kernel loop without reschedule in it during such 2
seconds.

However 2.2.19pre3aa4 includes also the lowlatency bugfixes in case you have
tons of ram and you're sending huge buffers to syscalls.

Andrea
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: 2.2.19pre3 and poor reponse to RT-scheduled processes?
  2000-12-30 18:16     ` Andrea Arcangeli
@ 2000-12-30 19:09       ` Linus Torvalds
  2000-12-30 19:25         ` Alexander Viro
  0 siblings, 1 reply; 7+ messages in thread
From: Linus Torvalds @ 2000-12-30 19:09 UTC (permalink / raw)
  To: linux-kernel

In article <20001230191639.E9332@athlon.random>,
Andrea Arcangeli  <andrea@suse.de> wrote:
>On Fri, Dec 29, 2000 at 04:54:23PM -0500, Rafal Boni wrote:
>> Now my box behaves much more reasonably... I'll just have to beat harder
>> on it and see what happens.
>
>Another thing: while writing to disk if you want low latency readers you can
>do:
>
>	elvtune -r 1 /dev/hd[abcd]
>
>The 1/2 seconds stalls you see could be just because of applications that waits
>I/O synchronously while the elevator is reodering I/O requests (and even if the
>elevator wouldn't reorder anything the new requests would go to the end of the
>I/O queue so they would have some higher latency anyways).

That sounds like too long a stall to be due to elevator ordering except
with some _really_ unlucky access patterns (or with slow disks). 

There are other, equally likely, candidates for these kinds of stalls:

 - filesystem locks. Especially the ext2 superblock lock. You can easily
   hit this one, as some ext2 functions actually do a lot of IO while
   holding the lock.

 - synchronously waiting for bdflush with balance_dirty_buffers().
   Especially mixed with the above.

A mixture of the two above will bascally stall the whole machine: almost
any non-cached file access ends up waiting for the superblock lock and
bdflush, and it can easily get quite unfair.

		Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: 2.2.19pre3 and poor reponse to RT-scheduled processes?
  2000-12-30 19:09       ` Linus Torvalds
@ 2000-12-30 19:25         ` Alexander Viro
  2000-12-30 19:31           ` Linus Torvalds
  0 siblings, 1 reply; 7+ messages in thread
From: Alexander Viro @ 2000-12-30 19:25 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-kernel



On 30 Dec 2000, Linus Torvalds wrote:

> There are other, equally likely, candidates for these kinds of stalls:
> 
>  - filesystem locks. Especially the ext2 superblock lock. You can easily
>    hit this one, as some ext2 functions actually do a lot of IO while
>    holding the lock.

Hmm... In 2.4 we can make the situation with superblock lock on ext2
much better. I didn't go the whole way down to spinlocks, but right now
I'm sitting on a box with modified ext2 that doesn't do _any_ IO in
protected parts of ext2_new_inode()/ext2_new_block(). I can try to
extract the relevant parts of the patch if you are interested (it also
got directories-in-pagecache stuff and better SMP threading of
get_block()/truncate()). The thing seems to be working fine and I see
no serious contention on lock_super(). Dunno if it's worth doing before
2.4.0, but since it has zero impact on the rest of tree (OK, zero except
that write_on_page() had been exported, but I could trivially get rid
of that)... Maybe 2.4.early would be a good idea.
							Cheers,
								Al

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: 2.2.19pre3 and poor reponse to RT-scheduled processes?
  2000-12-30 19:25         ` Alexander Viro
@ 2000-12-30 19:31           ` Linus Torvalds
  0 siblings, 0 replies; 7+ messages in thread
From: Linus Torvalds @ 2000-12-30 19:31 UTC (permalink / raw)
  To: Alexander Viro; +Cc: linux-kernel



On Sat, 30 Dec 2000, Alexander Viro wrote:
> On 30 Dec 2000, Linus Torvalds wrote:
> 
> > There are other, equally likely, candidates for these kinds of stalls:
> > 
> >  - filesystem locks. Especially the ext2 superblock lock. You can easily
> >    hit this one, as some ext2 functions actually do a lot of IO while
> >    holding the lock.
> 
> Hmm... In 2.4 we can make the situation with superblock lock on ext2
> much better.

Actually, 2.4.x right now is worse than 2.2.x in this regard, for a really
simple reason: 2.2.x will only do the equivalent of "rebalance_dirty" when
it dirties a previously clean buffer. The current 2.4.x code does that
regardless of whether the buffer was dirty before or not.

I want to see your patches to fix this for good in a 2.5.x timeframe (or,
if they are really clean and obvious, at a later 2.4.x date), but for
2.4.x I think that we'll do either "remove rebalance dirty completely" or
at the very least we'll not re-balance for re-dirtying a dirty buffer.

The re-dirtying a dirty buffer is the common case for the superblock
stuff: bitmap blocks etc are often dirty already, _especially_ in the case
of an active writer. So 2.4.x is actually more likely to hit the
superblock/bdflush contention.

Of course, 2.4.x has had so many improvements in file writing memory
pressure that it might not end up being that noticeable, but even so..

		Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2000-12-30 20:02 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2000-12-29 20:45 2.2.19pre3 and poor reponse to RT-scheduled processes? Rafal Boni
2000-12-29 21:19 ` Gregory Maxwell
2000-12-29 21:54   ` Rafal Boni
2000-12-30 18:16     ` Andrea Arcangeli
2000-12-30 19:09       ` Linus Torvalds
2000-12-30 19:25         ` Alexander Viro
2000-12-30 19:31           ` Linus Torvalds

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox