[BENCH] Problems with IO throughput and fairness with 2.4.10 and 2.4.9-ac15

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* [BENCH] Problems with IO throughput and fairness with 2.4.10 and  2.4.9-ac15
@ 2001-09-27 12:46 Robert Cohen
  2001-09-28  8:26 ` Stephan von Krawczynski
                   ` (2 more replies)
  0 siblings, 3 replies; 11+ messages in thread
From: Robert Cohen @ 2001-09-27 12:46 UTC (permalink / raw)
  To: linux-kernel

Given the recent flurry of changes in the Linux kernel VM subsystems I
decided to do a bit of benchmarking.
The benchmark is a test of file server performance. I originally did
this test about a year ago with fairly dismal results, so I thought I'd
see how much things had improved.

The good news, things have improved. The bad news, they're still not
good.

The test consists a linux server acting a file server (using netatalk)
and 5 macintosh clients.The clients each write a 30 Meg file and read it
back. Each client repeats this 10 times.Total amount of IO in the test
1.5 Gigs written, 1.5 Gigs read.

The tests were done with the following kernels

2.4.10: stock 2.4.10 kernel
2.4.10-aa1: 2.4.10 with Andreas patch aa1 including his vm-tweaks-1
2.4.10-p: 2.4.10 with Robert Loves preempt patch
2.4.9-ac15: Alans latest
2.4.9-ac15-al: 2.4.9-ac15 with Riks Aging+Launder patch

2.4.9-ac15 didnt fare too well, but Riks patch resolved these problems
so I will leave 2.4.9-ac15 out of the discussion.

The hardware was a UP P-II 266 with 256 Megs of memory using SCSI disks
on a Adaptec wide controller. The clients and server were all connected
to a 100 Mbit switch.
The hardware is nothing special, but disks and LAN are all capable of
pushing 10 MB/s of bandwidth.

In the test, the clients are each accessing 30 Meg files. With 5
clients, thats a file working set of 150 Megs of disk space being
accessed. With 256 Megs of memory, all the files can fit in memory. I
don't consider this to be a realistic test of file server behaviour
since if all your files on a file server can fit in memory you bought
too much memory :-). 

So for all the tests, the file server memory was limited to 128 Megs via
LILO except for a baseline test with 256 Megs.

The features of a file server that I consider important are obviously
file serving througput. But also fairness in that all clients should get
an equal share of the bandwidth. So for the tests, I report the time
that the last client finishes the run which indicates total throughput,
and the time the first client finishes which ideally should be not too
much before the last client.

Summary of the results
======================

In the baseline test with 256 Megs of memory, all the kernels performed
flawlessly. Close to 10 MB/s of thoughput was achieved evenly spread
between the clients.

In the real test with 128 Megs of memory, things didnt go as well. All
the kernels performed similarly but none were satisfactory. The problem
I saw was that all the clients would start out getting fairly bad
throughput of only a few MB/sec total amongst all the machines. This is
accompanied by heavy seeking of the disk (based on the sound).
Then one of the clients would "get in the groove". The good client gets
the full 10 MB/s of bandwidth and the rest are completely starved. The
good client zooms through to the finish with the rest of the clients
only just started. Once the good client finished, the disks seek madly
for a while with poor throughput until another client "gets in the
groove". 
Once you are down to 2 or 3 clients left, things settle down because the
files all fit in memory again.

Overall, the total throughput is not that bad, but the fact that it
achieves this by starving clients to let one client at a time proceed is
completely unacceptable for a file server.

Note: this is not an accurate benchmark in that the run times are not
highly repeatable. This means it can't be used for fine tuning kernels.
But at the moment, I am not concerned with fine tuning but a huge gaping
hole in linux file serving performance. And its probably true that the
non repeatability indicates a problem in itself. With a well tuned 
kernel, results should be much more repeatable.

Detailed result
===============

Here are the timing runs for each kernel. Times are Minutes:seconds. I
did two runs for each. 
Vmstat 5 outputs are available at
http://tltsu.anu.edu.au/~robert/linux_logs/ 
But none of the vmstat output shows any obvious problems. None of the
kernels used much swap.
And I didnt see any problems with daemons like kswapd chewing time.

Baseline run with 256 Megs
Run 1     First finished 4:05       Last finished: 4:18

Notes: this indicates best case performance

linux-2.4.10:
Run 1     First finished 2:15       Last finished: 5:36
Run 2     First finished 1:41       Last finished: 6:36

Linux-2.4.10-aa1
Run 1     First finished 3:38       Last finished: 8:40
Run 2     First finished 1:35       Last finished: 7:07

Notes: slightly worse than straight 2.4.10

Linux-2.4.10-p
Run 1     First finished 1:39       Last finished: 8:33
Run 2     First finished 1:46       Last finished: 6:10

Notes: no better than 2.4.10, of course the preempt kernel is not
advertised as a server OS but since the problems observed are primarily
fairness problems, I hoped it might help.

Linux-2.4.9-ac15-al
Run 1     First finished 2:00       Last finished: 5:30
Run 2     First finished 1:45       Last finished: 5:07

Notes: this has slightly better behaviour than 2.4.10 in that 2 clients
tend to "get in the groove" at a time and finish early and then another
2 etc.

Analysis
========

In the baseline test with 256 Megs, since all the files fit in page
cache, there is no reading at all. Only writing. The VM seems to handle
this flawlessly.

In the 128 Meg tests, reads start happening as well as writes since
things get flushed out of the page cache.
The VM doesnt cope with this as well. The symptom of heavy seeking with
poor throughput that is seen in this test I associate with poor elevator
performance. If the elevator doesnt group requests enough you get disk
behaviour like "small read, seek, small read, seek" instead of grouping
things into large reads or multiple reads between seeks. 

The problem where one client gets all the bandwidth has to be some kind
of livelock.
Normally I might suspect that the locked out process have been swapped
out, but in this case no swap is being used. I suspose their process
pages could have been flushed to make space for page cache pages.
But this would show up in an incease page cache size in vmstat. Which
doesnt seem to be the case.

 Ironically I believe this is associated with the elevator sorting
requests too aggressively.
All the file data for the processes that are locked out must be flushed
out of page cache, and the locked process can't get enough reads
scheduled to make any progress. Disk operations are coming in for the
"good" process fast enough to keep the disk busy, these are sorted to
the top by the elevator since they are near the current head position.
And noone else gets to make any progress.

It has been suggested that the problems might be specific to netatalk. 
However I have been unable to find anything that would indicates that
netatalk is doing anything odd. Stracing the file server processes shows
that they are just doing 8k reads and writes. The files are not opened
O_SYNC and the file server process arent doing any fsync calls. This is
supported by the fact that the performance is fine with 256 Megs of
memory.

I have been unable to find any non networked test that demonstates the
same problems.
Tests such as 5 simultaneous bonnie runs or a tiotest with 5 threads
that are superficially doing the same things don't see the same
problems.

What I believe is the cause is that since we have 5 clients fighting for
network bandwidth, the packets from each client are coming in
interleaved. So the granularity of operations that the server does is
very fine.
In a local test such as 5 bonnies, each process gets to have a full time
slice accessing its file before the next file is accessed. Which leads
to a much greater granularity.
So I supposed a modified version of tiotest that does a sched_yeild
after each read or write might see the same problems. But I havent
tested this theory.

If anyone has tests they would like me to do, or any patches they would
like me to try please let me know.

--
Robert Cohen
Unix Support, TLTSU
Australian National University
Ph: 612 58389	robert.cohen@anu.edu.au

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [BENCH] Problems with IO throughput and fairness with 2.4.10 and  2.4.9-ac15
  2001-09-27 12:46 [BENCH] Problems with IO throughput and fairness with 2.4.10 and 2.4.9-ac15 Robert Cohen
@ 2001-09-28  8:26 ` Stephan von Krawczynski
  2001-09-28  9:00   ` linux-2.4.9-ac15 and -ac16 compile error Zakhar Kirpichenko
  2001-09-28  8:51 ` [BENCH] Problems with IO throughput and fairness with 2.4.10 and 2.4.9-ac15 Gerold Jury
  2001-10-12  8:24 ` Andrea Arcangeli
  2 siblings, 1 reply; 11+ messages in thread
From: Stephan von Krawczynski @ 2001-09-28  8:26 UTC (permalink / raw)
  To: Robert Cohen; +Cc: linux-kernel

On Thu, 27 Sep 2001 22:46:17 +1000 Robert Cohen <robert.cohen@anu.edu.au>
wrote:

> Given the recent flurry of changes in the Linux kernel VM subsystems I
> decided to do a bit of benchmarking.
> The benchmark is a test of file server performance. I originally did
> this test about a year ago with fairly dismal results, so I thought I'd
> see how much things had improved.

Hello,

do you have comparison to 2.2.19 ?

Regards,
Stephan


^ permalink raw reply	[flat|nested] 11+ messages in thread

* linux-2.4.9-ac15 and -ac16 compile error
  2001-09-28  8:26 ` Stephan von Krawczynski
@ 2001-09-28  9:00   ` Zakhar Kirpichenko
  2001-09-28 10:02     ` Keith Owens
  0 siblings, 1 reply; 11+ messages in thread
From: Zakhar Kirpichenko @ 2001-09-28  9:00 UTC (permalink / raw)
  To: linux-kernel


	Hello there.

	I've got a problem compiling linux-2.4.9-ac15 and -ac16. When I'm
trying to compile APM support as module, I get this during depmod section
of 'make modules_install' (and when start 'depmod' manually):

if [ -r System.map ]; then /sbin/depmod -ae -F System.map  2.4.9-z6; fi
depmod: *** Unresolved symbols in
/lib/modules/2.4.9-z6/kernel/arch/i386/kernel/apm.o
depmod: 	__sysrq_unlock_table
depmod: 	__sysrq_get_key_op
depmod: 	__sysrq_put_key_op
depmod: 	__sysrq_lock_table

	Also I've got a problem compiling APM into the kernel:

ld -m elf_i386 -T /usr/src/linux-2.4.9-ac16/arch/i386/vmlinux.lds -e stext

[bla-bla-bla]

/usr/src/linux-2.4.9-z6/arch/i386/lib/lib.a \
	--end-group \
	-o vmlinux
arch/i386/kernel/kernel.o: In function `apm':
arch/i386/kernel/kernel.o(.text+0xbf8a): undefined reference to `__sysrq_lock_table'
arch/i386/kernel/kernel.o(.text+0xbf91): undefined reference to `__sysrq_get_key_op'
arch/i386/kernel/kernel.o(.text+0xbfa4): undefined reference to `__sysrq_put_key_op'
arch/i386/kernel/kernel.o(.text+0xbfac): undefined reference to `__sysrq_unlock_table'
make: *** [vmlinux] Error 1

	It was okay before -ac15.

	gcc version report is: gcc version 2.95.3 20010315 (release).

	Any ideas?

-- 
Zakhar Kirpichenko,
ZAK-UANIC



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: linux-2.4.9-ac15 and -ac16 compile error
  2001-09-28  9:00   ` linux-2.4.9-ac15 and -ac16 compile error Zakhar Kirpichenko
@ 2001-09-28 10:02     ` Keith Owens
  0 siblings, 0 replies; 11+ messages in thread
From: Keith Owens @ 2001-09-28 10:02 UTC (permalink / raw)
  To: Zakhar Kirpichenko; +Cc: linux-kernel

On Fri, 28 Sep 2001 12:00:33 +0300 (EEST), 
Zakhar Kirpichenko <zakhar@mirotel.net> wrote:
>	I've got a problem compiling linux-2.4.9-ac15 and -ac16. When I'm
>trying to compile APM support as module, I get this during depmod section
>of 'make modules_install' (and when start 'depmod' manually):
>
>if [ -r System.map ]; then /sbin/depmod -ae -F System.map  2.4.9-z6; fi
>depmod: *** Unresolved symbols in
>/lib/modules/2.4.9-z6/kernel/arch/i386/kernel/apm.o
>depmod: 	__sysrq_unlock_table

Turn on CONFIG_MAGIC_SYSRQ for now, there will be a fix in a later kernel.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [BENCH] Problems with IO throughput and fairness with 2.4.10 and  2.4.9-ac15
  2001-09-27 12:46 [BENCH] Problems with IO throughput and fairness with 2.4.10 and 2.4.9-ac15 Robert Cohen
  2001-09-28  8:26 ` Stephan von Krawczynski
@ 2001-09-28  8:51 ` Gerold Jury
  2001-09-28 10:27   ` Andrey Nekrasov
  2001-10-12  8:24 ` Andrea Arcangeli
  2 siblings, 1 reply; 11+ messages in thread
From: Gerold Jury @ 2001-09-28  8:51 UTC (permalink / raw)
  To: Robert Cohen, linux-kernel

I have tried 2.4.9-xfs against 2.4.10-xfs with dbench.
The machine has 384 MB ram.

The throughput is roughly the same for both with dbench 2.
dbench 32 runs fine on 2.4.9-xfs but does not finish on 2.4.10-xfs.
dbench 24 will finish on 2.4.10 but it takes a very very long time.
All dbench processes are stuck in D state after 10 seconds.

I am not sure if it is the xfs part, the VM or both.

Can you give the dbench 32 a try ?

Regards
Gerold

On Thursday 27 September 2001 14:46, Robert Cohen wrote:
> Given the recent flurry of changes in the Linux kernel VM subsystems I
> decided to do a bit of benchmarking.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [BENCH] Problems with IO throughput and fairness with 2.4.10 and  2.4.9-ac15
  2001-09-28  8:51 ` [BENCH] Problems with IO throughput and fairness with 2.4.10 and 2.4.9-ac15 Gerold Jury
@ 2001-09-28 10:27   ` Andrey Nekrasov
  2001-09-28 12:48     ` Gerold Jury
  0 siblings, 1 reply; 11+ messages in thread
From: Andrey Nekrasov @ 2001-09-28 10:27 UTC (permalink / raw)
  To: linux-kernel

Hello Gerold Jury,

Once you wrote about "Re: [BENCH] Problems with IO throughput and fairness with 2.4.10 and  2.4.9-ac15":
> I have tried 2.4.9-xfs against 2.4.10-xfs with dbench.
> The machine has 384 MB ram.

 IDE/SCSI/RAID Controller?

> The throughput is roughly the same for both with dbench 2.
> dbench 32 runs fine on 2.4.9-xfs but does not finish on 2.4.10-xfs.
> dbench 24 will finish on 2.4.10 but it takes a very very long time.
> All dbench processes are stuck in D state after 10 seconds.
> 
> I am not sure if it is the xfs part, the VM or both.
> 
> Can you give the dbench 32 a try ?

I am run "dbench 32", all test ok.
Kernel 2.4.10-xfs + 2.4.10.aa1 + preempteble patch.
File system on test partition ext2.

Compiled with no highmem support.

Hardware configuration: 

Dell Optiplex G1 (P2-350/256RAM/IDE disk 1Gb)


-- 
bye.
Andrey Nekrasov, SpyLOG.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [BENCH] Problems with IO throughput and fairness with 2.4.10 and  2.4.9-ac15
  2001-09-28 10:27   ` Andrey Nekrasov
@ 2001-09-28 12:48     ` Gerold Jury
  2001-09-28 15:22       ` Steve Lord
  2001-09-28 17:58       ` Steve Lord
  0 siblings, 2 replies; 11+ messages in thread
From: Gerold Jury @ 2001-09-28 12:48 UTC (permalink / raw)
  To: Andrey Nekrasov, linux-kernel

Thanks, nice to hear.

So it needs to be something stupid on my side or xfs with the new VM.

By the way. It is an Atlon 1.1 (kernel compiled with Atlon optimisation)
IDE controller ATA66, disk IBM 15 GB ATA33
The machine is solid with and without VIA pci bit 7 byte 55 zero/one
swapspace 256MB

preempable patch does not help with my D state problem
i have not tried 2.4.10.aa1
but i will try with ext2 instead of xfs next time

Gerold

On Friday 28 September 2001 12:27, Andrey Nekrasov wrote:
> Hello Gerold Jury,
>
>
> I am run "dbench 32", all test ok.
> Kernel 2.4.10-xfs + 2.4.10.aa1 + preempteble patch.
> File system on test partition ext2.
>
> Compiled with no highmem support.
>
> Hardware configuration:
>
> Dell Optiplex G1 (P2-350/256RAM/IDE disk 1Gb)

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [BENCH] Problems with IO throughput and fairness with 2.4.10 and 2.4.9-ac15
  2001-09-28 12:48     ` Gerold Jury
@ 2001-09-28 15:22       ` Steve Lord
  2001-09-28 17:58       ` Steve Lord
  1 sibling, 0 replies; 11+ messages in thread
From: Steve Lord @ 2001-09-28 15:22 UTC (permalink / raw)
  To: Gerold Jury; +Cc: Andrey Nekrasov, linux-kernel

> Thanks, nice to hear.
> 
> So it needs to be something stupid on my side or xfs with the new VM.

I am working on a memory deadlock which has come up with XFS, it is not
new with the latest VM changes, but certainly seems to be likely to
happen.

Stay tuned.

Steve



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [BENCH] Problems with IO throughput and fairness with 2.4.10 and 2.4.9-ac15
  2001-09-28 12:48     ` Gerold Jury
  2001-09-28 15:22       ` Steve Lord
@ 2001-09-28 17:58       ` Steve Lord
  2001-09-29 14:13         ` Gerold Jury
  1 sibling, 1 reply; 11+ messages in thread
From: Steve Lord @ 2001-09-28 17:58 UTC (permalink / raw)
  To: Gerold Jury; +Cc: Andrey Nekrasov, linux-kernel

> Thanks, nice to hear.
> 
> So it needs to be something stupid on my side or xfs with the new VM.
> 
> By the way. It is an Atlon 1.1 (kernel compiled with Atlon optimisation)
> IDE controller ATA66, disk IBM 15 GB ATA33
> The machine is solid with and without VIA pci bit 7 byte 55 zero/one
> swapspace 256MB
> 
> preempable patch does not help with my D state problem
> i have not tried 2.4.10.aa1
> but i will try with ext2 instead of xfs next time
> 
> Gerold
> 

Hi,

Can you try XFS with this change, just to confirm you are seeing the same
problem I am seeing. I am not proposing this as a permanent fix yet,
just confirming what the deadlock is.

Thanks

   Steve



===========================================================================
Index: linux/fs/inode.c
===========================================================================

--- /usr/tmp/TmpDir.21835-0/linux/fs/inode.c_1.53	Fri Sep 28 12:57:27 2001
+++ linux/fs/inode.c	Fri Sep 28 10:17:49 2001
@@ -76,7 +76,7 @@
 static kmem_cache_t * inode_cachep;
 
 #define alloc_inode() \
-	 ((struct inode *) kmem_cache_alloc(inode_cachep, SLAB_KERNEL))
+	 ((struct inode *) kmem_cache_alloc(inode_cachep, SLAB_NOFS))
 static void destroy_inode(struct inode *inode) 
 {
 	if (inode_has_buffers(inode))



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [BENCH] Problems with IO throughput and fairness with 2.4.10 and 2.4.9-ac15
  2001-09-28 17:58       ` Steve Lord
@ 2001-09-29 14:13         ` Gerold Jury
  0 siblings, 0 replies; 11+ messages in thread
From: Gerold Jury @ 2001-09-29 14:13 UTC (permalink / raw)
  To: Steve Lord; +Cc: linux-kernel

On Friday 28 September 2001 19:58, Steve Lord wrote:
> Hi,
>
> Can you try XFS with this change, just to confirm you are seeing the same
> problem I am seeing. I am not proposing this as a permanent fix yet,
> just confirming what the deadlock is.
>
> Thanks
>
>    Steve
>
The deadlock is gone. dbench 32 gives me
Throughput 1.55412 MB/sec (NB=1.94265 MB/sec  15.5412 MBit/sec)  32 procs
with 2.4.10-xfs + your patch

I will leave it this way.

Unfortunately i have a buissnes trip to South Afrika for at least 8 days 
starting tomorrow. I will not be able to do any further testing until then.

Thanks
Gerold

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [BENCH] Problems with IO throughput and fairness with 2.4.10 and 2.4.9-ac15
  2001-09-27 12:46 [BENCH] Problems with IO throughput and fairness with 2.4.10 and 2.4.9-ac15 Robert Cohen
  2001-09-28  8:26 ` Stephan von Krawczynski
  2001-09-28  8:51 ` [BENCH] Problems with IO throughput and fairness with 2.4.10 and 2.4.9-ac15 Gerold Jury
@ 2001-10-12  8:24 ` Andrea Arcangeli
  2 siblings, 0 replies; 11+ messages in thread
From: Andrea Arcangeli @ 2001-10-12  8:24 UTC (permalink / raw)
  To: Robert Cohen; +Cc: linux-kernel

On Thu, Sep 27, 2001 at 10:46:17PM +1000, Robert Cohen wrote:
> Overall, the total throughput is not that bad, but the fact that it
> achieves this by starving clients to let one client at a time proceed is
> completely unacceptable for a file server.

So the problem here is starvation if I understand well.

This one isn't related to the VM, so it's normal that you don't see much
difference among the different VM, it's more likely either related to
netatalk or tcp or I/O elevator.

Anyways you can pretty well rule out the elevator using elvtune -r 1 -w
1 /dev/hd[abcd] and see if the starvation goes away.

> poor throughput that is seen in this test I associate with poor elevator
> performance. If the elevator doesnt group requests enough you get disk
> behaviour like "small read, seek, small read, seek" instead of grouping
> things into large reads or multiple reads between seeks. 

If you hear the seeks that's very good for the fairness, making the
elevator even more aggressive could only increase starvation of some
client.

> The problem where one client gets all the bandwidth has to be some kind
> of livelock.

netatalk may be processing the I/O requests not in a fair manner and if
the unfariness is introduced by netatalk no matter what tcp and I/O
subsystem do, we can do nothing to fix it from the kernel side. OTOH you
said that in the "cached" test netatalk was providing a fair
fileserving, but I'd still prefer if you could reproduce without using
netatalk, you can just use a rsh pipe to do the read and writes of the
files over the network for example, it should stress tcp and I/O
subsystem the same way. If you can't reproduce with rsh please file a
report to the netatalk people.

I doubt it's the tcp congestion control, of course it's unfair too
across multiple streams but I wouldn't expect it to generate that bad
fariness results.

> that they are just doing 8k reads and writes. The files are not opened
> O_SYNC and the file server process arent doing any fsync calls. This is

ok.

> supported by the fact that the performance is fine with 256 Megs of
> memory.

yes.

Andrea

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2001-10-12  8:24 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2001-09-27 12:46 [BENCH] Problems with IO throughput and fairness with 2.4.10 and 2.4.9-ac15 Robert Cohen
2001-09-28  8:26 ` Stephan von Krawczynski
2001-09-28  9:00   ` linux-2.4.9-ac15 and -ac16 compile error Zakhar Kirpichenko
2001-09-28 10:02     ` Keith Owens
2001-09-28  8:51 ` [BENCH] Problems with IO throughput and fairness with 2.4.10 and 2.4.9-ac15 Gerold Jury
2001-09-28 10:27   ` Andrey Nekrasov
2001-09-28 12:48     ` Gerold Jury
2001-09-28 15:22       ` Steve Lord
2001-09-28 17:58       ` Steve Lord
2001-09-29 14:13         ` Gerold Jury
2001-10-12  8:24 ` Andrea Arcangeli

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox