From: Andrea Arcangeli <andrea@suse.de>
To: Lincoln Dale <ltd@cisco.com>
Cc: Andrew Morton <akpm@zip.com.au>,
Kernel Mailing List <linux-kernel@vger.kernel.org>
Subject: Re: O_DIRECT performance impact on 2.4.18 (was: Re: [PATCH] 2.5.14IDE 56)
Date: Fri, 10 May 2002 14:36:16 +0200 [thread overview]
Message-ID: <20020510143616.C13730@dualathlon.random> (raw)
In-Reply-To: <3CDAC4EB.FC4FE5CF@zip.com.au> <5.1.0.14.2.20020510155122.02d97910@mira-sjcm-3.cisco.com> <5.1.0.14.2.20020510191214.018915f0@mira-sjcm-3.cisco.com>
On Fri, May 10, 2002 at 08:14:10PM +1000, Lincoln Dale wrote:
> At 12:15 AM 10/05/2002 -0700, Andrew Morton wrote:
> >Try it with the block-highmem patch:
> >
> >http://www.kernel.org/pub/linux/kernel/people/andrea/kernels/v2.4/2.4.19pre1aa1/00_block-highmem-all-18b-4.gz
>
> given i had to recompile the kernel to add lockmeter, i'd already cheated
> and changed PAGE_OFFSET from 0xc0000000 to 0x80000000, obviating the
> requirement for highmem altogether.
>
> being fair to O_DIRECT and giving it 1mbyte disk-reads to work with and
> giving normal i/o 8kbyte reads to work with.
> still using 2.4.18 with profile=2 enabled and lockmeter in the kernel but
> not turned on. still using the same disk spindles (just 6 this time), each
> a 18G 15K RPM disk spindle.
> i got tired of scanning the entire available space on an 18G disk so just
> dropped the test down to the first 2G of each disk.
is any of the disks mounted?
>
> O_DIRECT is still a ~30% performance hit versus just talking to the
> /dev/sdX device directly. profile traces at bottom.
>
> normal block-device disks sd[m-r] without O_DIRECT, 64K x 8k reads:
> [root@mel-stglab-host1 src]# readprofile -r;
> ./test_disk_performance blocks=64K bs=8k /dev/sd[m-r]
> Completed reading 12000 mbytes in 125.028612 seconds (95.98
> Mbytes/sec), 76usec mean
can you post your test_disk_performance program so I in particular we
can see the semantics of blocks and bs? 64k*8k == 5k * 1M / 10.
>
> normal block-device disks sd[m-r] with O_DIRECT, 5K x 1 megabyte reads:
> [root@mel-stglab-host1 src]# readprofile -r;
> ./test_disk_performance blocks=5K bs=1m direct /dev/sd[m-r]
> Completed reading 12000 mbytes in 182.492975 seconds (65.76
> Mbytes/sec), 15416usec mean
>
> for interests-sake, compare this to using the 'raw' versions of the same
> disks:
> [root@mel-stglab-host1 src]# readprofile -r;
> ./test_disk_performance blocks=5K bs=1m /dev/raw/raw[2-7]
> Completed reading 12000 mbytes in 206.346371 seconds (58.15
> Mbytes/sec), 16860usec mean
O_DIRECT has to do some more work to check for the coherency with the
pagecache and it has some more overhead with the address space
operations, but O_DIRECT by default uses the blocksize of the blkdev,
that is set to 1k by default (if you never mounted it) versus the
hardblocksize of 512bytes used by the raw device (assuming the sd[m-r]
aren't mounted).
This is most probably why O_DIRECT is faster than raw.c, otherwise they
would run almost at the same rate, the pagecache coherency fast paths
and the address space ops overhead of O_DIRECT shouldn't be noticeable.
>
> of course, these are all ~25% worse than if a mechanism of performing the
> i/o avoiding the copy_to_user() altogether:
> [root@mel-stglab-host1 src]# readprofile -r;
> ./test_disk_performance blocks=64K bs=8k nocopy /dev/sd[m-r]
> Completed reading 12000 mbytes in 97.846938 seconds (122.64
> Mbytes/sec), 59usec mean
the nocopy hack is not an interesting test for O_DIRECT/rawio, it
doesn't walk pagetables, it doesn't allow the DMA to be done into
userspace memory. If you want the pagecache to be visible into userspace
(i.e. MAP_PRIVATE/MAP_SHARED) you must deal with pagetables somehow,
and if you want the read/write syscalls to DMA directly into userspace
memory (raw/O_DIRECT) you must still walk pagetables during those
syscalls before starting the DMA. If you don't want to explicitly deal
with the pagetables then you need to copy_user (case 1). In most archs
where mem bandwith is very expensive avoiding the copy-user is a big
global win (other cpus won't collapse in smp etc..).
Your nocopy hack benchmark has some relevance only for usages of the
data done by kernel. So if it is the kernel that reads the data directly
from pagecache (i.e. a kernel module), then your nocopy benchmark
matters. For example your nocopy benchmark also matters for sendfile
zerocopy, it will read at 122M/sec. But if it's userspace supposed to
receive the data (so not directly from pagecache on the kernel direct
mapping, but in userspace mapped memory) it cannot be 122M/sec, it has
to be less due the user address space management.
>
>
> anyone want to see any other benchmarks performed? would a comparison to
> 2.5.x be useful?
>
>
> comparative profile=2 traces:
> - no O_DIRECT:
> [root@mel-stglab-host1 src]# readprofile -v | sort -n -k3 | tail -10
> 80125060 _spin_lock_ 718 6.4107
> 8013bfc0 brw_kiovec 798 0.9591
> 801cbb40 generic_make_request 830 2.8819
> 801f9400 scsi_init_io_vc 831 2.2582
> 8013c840 try_to_free_buffers 1198 3.4034
> 8013a190 end_buffer_io_async 2453 12.7760
> 8012b100 file_read_actor 3459 36.0312
> 801cb4e0 __make_request 7532 4.6152
> 80105220 default_idle 106468 1663.5625
> 00000000 total 134102 0.0726
>
> - O_DIRECT, disks /dev/sd[m-r]:
> [root@mel-stglab-host1 src]# readprofile -v | sort -n -k3 | tail -10
> 801cbb40 generic_make_request 72 0.2500
> 8013ab00 set_bh_page 73 1.1406
> 801cbc60 submit_bh 116 1.0357
> 801f72a0 __scsi_end_request 133 0.4618
> 80139540 unlock_buffer 139 1.7375
> 8013bf10 end_buffer_io_kiobuf 302 4.7188
> 8013bfc0 brw_kiovec 357 0.4291
> 801cb4e0 __make_request 995 0.6097
> 80105220 default_idle 34243 535.0469
> 00000000 total 37101 0.0201
>
> - /dev/raw/raw[2-7]:
> [root@mel-stglab-host1 src]# readprofile -v | sort -n -k3 | tail -10
> 8013bf50 wait_kio 349 3.1161
> 801cbb40 generic_make_request 461 1.6007
> 801cbc60 submit_bh 526 4.6964
> 80139540 unlock_buffer 666 8.3250
> 801f72a0 __scsi_end_request 699 2.4271
> 8013bf10 end_buffer_io_kiobuf 1672 26.1250
> 8013bfc0 brw_kiovec 1906 2.2909
> 801cb4e0 __make_request 10495 6.4308
> 80105220 default_idle 84418 1319.0312
> 00000000 total 103516 0.0560
>
> - O_NOCOPY hack: (userspace doesn't actually get the read data)
> 801f9400 scsi_init_io_vc 785 2.1332
> 8013c840 try_to_free_buffers 950 2.6989
> 801f72a0 __scsi_end_request 966 3.3542
> 801cbb40 generic_make_request 1017 3.5312
> 8013bf10 end_buffer_io_kiobuf 1672 26.1250
> 8013a190 end_buffer_io_async 1693 8.8177
> 8013bfc0 brw_kiovec 1906 2.2909
> 801cb4e0 __make_request 13682 8.3836
> 80105220 default_idle 112345 1755.3906
> 00000000 total 144891 0.0784
Can you use -k4? this is the number of hits per function, but we should
take the size of the function into account too. Otherwise small
functions won't show up.
Can you also give a spin to the same benchmark with 2.4.19pre8aa2? It
has the vary-io stuff from Badari and futher kiobuf optimization from
Chuck. (vary-io will work only with aic and qlogic, enabling it is a one
liner if the driver is just ok with variable bh->b_size in the same I/O
request). right fix for avoiding the flood of small bh is bio in 2.5,
for 2.4 vary-io should be fine.
thanks,
Andrea
next prev parent reply other threads:[~2002-05-10 12:35 UTC|newest]
Thread overview: 220+ messages / expand[flat|nested] mbox.gz Atom feed top
2002-05-06 3:53 Linux-2.5.14 Linus Torvalds
2002-05-06 6:30 ` Linux-2.5.14 Daniel Pittman
2002-05-06 6:51 ` Linux-2.5.14 Andrew Morton
2002-05-06 15:13 ` Linux-2.5.14 Linus Torvalds
2002-05-07 4:28 ` Linux-2.5.14 Daniel Pittman
2002-05-09 3:53 ` Linux-2.5.14 Daniel Pittman
2002-05-09 4:34 ` Linux-2.5.14 Andrew Morton
2002-05-09 6:02 ` Linux-2.5.14 Daniel Pittman
2002-05-06 6:47 ` Linux-2.5.14 bert hubert
2002-05-06 7:07 ` Linux-2.5.14 Andrew Morton
2002-05-06 14:00 ` Linux-2.5.14 Rik van Riel
2002-05-06 9:09 ` [PATCH] 2.5.14 IDE 55 Martin Dalecki
2002-05-06 17:48 ` David Lang
2002-05-06 22:40 ` Roman Zippel
2002-05-07 10:10 ` Martin Dalecki
2002-05-07 11:31 ` Roman Zippel
2002-05-07 10:31 ` Martin Dalecki
2002-05-07 10:34 ` Martin Dalecki
2002-05-07 11:48 ` Roman Zippel
2002-05-07 11:19 ` Martin Dalecki
2002-05-07 12:35 ` Roman Zippel
2002-05-07 12:36 ` Andrey Panin
2002-05-07 11:32 ` Martin Dalecki
2002-05-07 12:38 ` Dave Jones
2002-05-07 0:03 ` Roman Zippel
2002-05-07 10:12 ` Martin Dalecki
2002-05-07 11:39 ` Roman Zippel
2002-05-07 10:40 ` Martin Dalecki
2002-05-07 12:42 ` Roman Zippel
2002-05-07 11:22 ` [PATCH] 2.5.14 IDE 56 Martin Dalecki
2002-05-07 14:02 ` Padraig Brady
2002-05-07 13:15 ` Martin Dalecki
2002-05-07 14:30 ` Padraig Brady
2002-05-07 15:08 ` Anton Altaparmakov
2002-05-07 15:36 ` Linus Torvalds
2002-05-07 16:20 ` Jan Harkes
2002-05-07 15:26 ` Martin Dalecki
2002-05-07 21:36 ` Jan Harkes
2002-05-08 0:25 ` Guest section DW
2002-05-08 3:03 ` Jan Harkes
2002-05-08 9:03 ` Martin Dalecki
2002-05-08 12:10 ` Alan Cox
2002-05-08 10:51 ` Martin Dalecki
2002-05-07 16:29 ` Padraig Brady
2002-05-07 16:51 ` Linus Torvalds
2002-05-07 18:29 ` Kai Henningsen
2002-05-08 7:48 ` Juan Quintela
2002-05-08 16:54 ` Linus Torvalds
2002-05-07 17:08 ` Alan Cox
2002-05-07 17:00 ` Linus Torvalds
2002-05-07 17:19 ` benh
2002-05-07 17:24 ` Linus Torvalds
2002-05-07 17:30 ` benh
2002-05-10 1:45 ` Mike Fedyk
2002-05-07 17:43 ` Richard Gooch
2002-05-07 18:05 ` Linus Torvalds
2002-05-07 18:26 ` Alan Cox
2002-05-07 18:16 ` Linus Torvalds
2002-05-07 18:40 ` Richard Gooch
2002-05-07 18:46 ` Linus Torvalds
2002-05-07 23:54 ` Roman Zippel
2002-05-08 6:57 ` Kai Henningsen
2002-05-08 9:37 ` Ian Molton
2002-05-09 13:58 ` Pavel Machek
2002-05-08 8:21 ` Martin Dalecki
2002-05-07 17:27 ` Jauder Ho
2002-05-08 8:13 ` Martin Dalecki
2002-05-07 18:29 ` Patrick Mochel
2002-05-07 18:02 ` Greg KH
2002-05-07 18:44 ` Richard Gooch
2002-05-07 18:44 ` Patrick Mochel
2002-05-07 19:21 ` Richard Gooch
2002-05-07 19:58 ` Patrick Mochel
2002-05-07 18:49 ` Thunder from the hill
2002-05-07 19:47 ` Patrick Mochel
2002-05-07 22:03 ` Richard Gooch
2002-05-08 8:14 ` Russell King
2002-05-08 16:07 ` Richard Gooch
2002-05-08 17:07 ` Russell King
2002-05-08 8:18 ` Martin Dalecki
2002-05-08 8:07 ` Martin Dalecki
2002-05-08 7:58 ` Martin Dalecki
2002-05-08 12:18 ` Alan Cox
2002-05-08 11:09 ` Martin Dalecki
2002-05-08 12:42 ` Alan Cox
2002-05-08 11:23 ` Martin Dalecki
2002-05-09 2:37 ` Lincoln Dale
2002-05-09 3:10 ` Andrew Morton
2002-05-09 10:05 ` Lincoln Dale
2002-05-09 18:50 ` Andrew Morton
2002-05-10 0:33 ` Andi Kleen
2002-05-10 0:48 ` Andrew Morton
2002-05-10 1:06 ` Andi Kleen
2002-05-13 17:51 ` Pavel Machek
2002-05-14 21:44 ` Andi Kleen
2002-05-10 6:50 ` O_DIRECT performance impact on 2.4.18 (was: Re: [PATCH] 2.5.14 IDE 56) Lincoln Dale
2002-05-10 7:15 ` O_DIRECT performance impact on 2.4.18 (was: Re: [PATCH] 2.5.14IDE 56) Andrew Morton
2002-05-10 7:21 ` Jens Axboe
2002-05-10 8:12 ` Andrea Arcangeli
2002-05-10 10:14 ` Lincoln Dale
2002-05-10 12:36 ` Andrea Arcangeli [this message]
2002-05-11 3:23 ` Lincoln Dale
2002-05-13 11:19 ` Andrea Arcangeli
2002-05-13 23:58 ` Lincoln Dale
2002-05-14 0:22 ` Andrea Arcangeli
2002-05-14 2:43 ` O_DIRECT on 2.4.19pre8aa2 md device Lincoln Dale
2002-05-21 15:51 ` Andrea Arcangeli
2002-05-22 1:18 ` Lincoln Dale
2002-05-22 2:51 ` Andrea Arcangeli
2002-06-03 4:53 ` high-end i/o performance of 2.4.19pre8aa2 (was: Re: O_DIRECT on 2.4.19pre8aa2 device) Lincoln Dale
2002-05-12 11:23 ` O_DIRECT performance impact on 2.4.18 (was: Re: [PATCH] 2.5.14IDE 56) Lincoln Dale
2002-05-13 11:37 ` Andrea Arcangeli
2002-05-10 15:55 ` O_DIRECT performance impact on 2.4.18 (was: Re: [PATCH] 2.5.14 IDE 56) Linus Torvalds
2002-05-11 1:01 ` Gerrit Huizenga
2002-05-11 18:04 ` Linus Torvalds
2002-05-11 18:19 ` Larry McVoy
2002-05-11 18:35 ` Linus Torvalds
2002-05-11 18:37 ` Larry McVoy
2002-05-11 18:56 ` Linus Torvalds
2002-05-11 21:42 ` Gerrit Huizenga
2002-05-11 18:43 ` Mr. James W. Laferriere
2002-05-11 23:38 ` Lincoln Dale
2002-05-12 0:36 ` yodaiken
2002-05-12 2:40 ` Andrew Morton
2002-05-11 18:26 ` O_DIRECT performance impact on 2.4.18 (was: Re: [PATCH] 2.5.14 Alan Cox
2002-05-11 18:09 ` Linus Torvalds
2002-05-11 18:45 ` O_DIRECT performance impact on 2.4.18 (was: Re: [PATCH] 2.5.14 IDE 56) yodaiken
2002-05-11 19:55 ` O_DIRECT performance impact on 2.4.18 Bernd Eckenfels
2002-05-11 14:18 ` O_DIRECT performance impact on 2.4.18 (was: Re: [PATCH] 2.5.14 IDE 56) Roy Sigurd Karlsbakk
2002-05-11 14:24 ` Jens Axboe
2002-05-11 18:25 ` Gerrit Huizenga
2002-05-11 20:17 ` Jens Axboe
2002-05-11 22:27 ` Gerrit Huizenga
2002-05-11 23:17 ` Lincoln Dale
2002-05-09 4:16 ` [PATCH] 2.5.14 IDE 56 Andre Hedrick
2002-05-09 13:32 ` Alan Cox
2002-05-09 14:58 ` Alan Cox
2002-05-08 18:21 ` Erik Andersen
2002-05-08 18:59 ` Dave Jones
2002-05-08 19:31 ` Alan Cox
2002-05-08 21:16 ` Erik Andersen
2002-05-08 22:14 ` Alan Cox
2002-05-09 13:13 ` Pavel Machek
2002-05-09 19:22 ` Daniel Jacobowitz
2002-05-10 12:01 ` Padraig Brady
2002-05-09 13:18 ` Pavel Machek
2002-05-07 17:10 ` Richard B. Johnson
2002-05-08 7:36 ` Martin Dalecki
2002-05-08 17:22 ` Greg KH
2002-05-08 18:46 ` Denis Vlasenko
2002-05-07 11:27 ` [PATCH] 2.5.14 IDE 57 Martin Dalecki
2002-05-07 13:16 ` Anton Altaparmakov
2002-05-07 12:34 ` Martin Dalecki
2002-05-07 13:56 ` Mikael Pettersson
2002-05-07 14:04 ` Dave Jones
2002-05-07 13:57 ` Anton Altaparmakov
2002-05-07 14:08 ` Dave Jones
2002-05-07 13:11 ` Martin Dalecki
2002-05-07 14:29 ` Anton Altaparmakov
2002-05-07 13:36 ` Martin Dalecki
2002-05-07 15:08 ` Anton Altaparmakov
2002-05-07 16:51 ` Dave Jones
2002-05-08 3:38 ` Anton Altaparmakov
2002-05-08 11:47 ` Dave Jones
2002-05-07 15:07 ` Padraig Brady
2002-05-07 17:21 ` Andre Hedrick
2002-05-11 14:09 ` Aaron Lehmann
2002-05-07 15:03 ` [PATCH] IDE 58 Martin Dalecki
2002-05-08 6:42 ` Paul Mackerras
2002-05-08 8:53 ` Martin Dalecki
2002-05-08 10:37 ` Bjorn Wesen
2002-05-08 10:16 ` Martin Dalecki
2002-05-08 19:06 ` Linus Torvalds
2002-05-08 19:10 ` Benjamin Herrenschmidt
2002-05-08 20:31 ` Alan Cox
2002-05-08 19:49 ` Benjamin Herrenschmidt
2002-05-08 20:44 ` Alan Cox
2002-05-08 20:04 ` Benjamin Herrenschmidt
2002-05-09 20:20 ` Ian Molton
2002-05-08 20:36 ` Andre Hedrick
2002-05-08 20:29 ` Andre Hedrick
2002-05-08 20:06 ` Benjamin Herrenschmidt
2002-05-09 12:14 ` Martin Dalecki
2002-05-09 15:19 ` Eric W. Biederman
2002-05-09 20:20 ` Ian Molton
2002-05-08 11:00 ` Benjamin Herrenschmidt
2002-05-09 19:58 ` [PATCH] 2.5.14 IDE 59 Martin Dalecki
2002-05-11 4:16 ` William Lee Irwin III
2002-05-11 16:59 ` [PATCH] 2.5.15 IDE 60 Martin Dalecki
2002-05-11 18:47 ` Pierre Rousselet
2002-05-11 19:12 ` Andre Hedrick
2002-05-11 19:52 ` Pierre Rousselet
2002-05-11 23:48 ` Andre Hedrick
2002-05-12 19:19 ` pdc202xx.c fails to compile in 2.5.15 Zlatko Calusic
2002-05-12 19:40 ` Jurriaan on Alpha
2002-05-12 22:00 ` Petr Vandrovec
2002-05-13 12:03 ` Alan Cox
2002-05-13 9:48 ` [PATCH] 2.5.15 IDE 61 Martin Dalecki
2002-05-13 12:17 ` [PATCH] 2.5.15 IDE 62 Martin Dalecki
2002-05-13 13:48 ` Jens Axboe
2002-05-13 13:02 ` Martin Dalecki
2002-05-13 15:38 ` Jens Axboe
2002-05-13 15:45 ` Martin Dalecki
2002-05-13 16:54 ` Linus Torvalds
2002-05-13 16:55 ` Jens Axboe
2002-05-13 16:00 ` Martin Dalecki
2002-05-13 18:02 ` benh
2002-05-13 15:50 ` Martin Dalecki
2002-05-13 17:52 ` benh
2002-05-13 15:55 ` Martin Dalecki
2002-05-13 19:13 ` benh
2002-05-14 8:48 ` Martin Dalecki
2002-05-17 11:40 ` Martin Dalecki
2002-05-17 2:27 ` Benjamin Herrenschmidt
2002-05-13 15:36 ` Tom Rini
2002-05-14 10:26 ` [PATCH] 2.5.15 IDE 62a Martin Dalecki
2002-05-14 10:28 ` [PATCH] 2.5.15 IDE 63 Martin Dalecki
2002-05-15 12:04 ` [PATCH] 2.5.15 IDE 64 Martin Dalecki
2002-05-15 13:12 ` Russell King
2002-05-15 12:14 ` Martin Dalecki
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20020510143616.C13730@dualathlon.random \
--to=andrea@suse.de \
--cc=akpm@zip.com.au \
--cc=linux-kernel@vger.kernel.org \
--cc=ltd@cisco.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox