public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Andrea Arcangeli <andrea@suse.de>
To: Lincoln Dale <ltd@cisco.com>
Cc: Andrew Morton <akpm@zip.com.au>,
	Kernel Mailing List <linux-kernel@vger.kernel.org>
Subject: Re: O_DIRECT performance impact on 2.4.18 (was: Re: [PATCH] 2.5.14IDE  56)
Date: Fri, 10 May 2002 14:36:16 +0200	[thread overview]
Message-ID: <20020510143616.C13730@dualathlon.random> (raw)
In-Reply-To: <3CDAC4EB.FC4FE5CF@zip.com.au> <5.1.0.14.2.20020510155122.02d97910@mira-sjcm-3.cisco.com> <5.1.0.14.2.20020510191214.018915f0@mira-sjcm-3.cisco.com>

On Fri, May 10, 2002 at 08:14:10PM +1000, Lincoln Dale wrote:
> At 12:15 AM 10/05/2002 -0700, Andrew Morton wrote:
> >Try it with the block-highmem patch:
> >
> >http://www.kernel.org/pub/linux/kernel/people/andrea/kernels/v2.4/2.4.19pre1aa1/00_block-highmem-all-18b-4.gz
> 
> given i had to recompile the kernel to add lockmeter, i'd already cheated 
> and changed PAGE_OFFSET from 0xc0000000 to 0x80000000, obviating the 
> requirement for highmem altogether.
> 
> being fair to O_DIRECT and giving it 1mbyte disk-reads to work with and 
> giving normal i/o 8kbyte reads to work with.
> still using 2.4.18 with profile=2 enabled and lockmeter in the kernel but 
> not turned on.  still using the same disk spindles (just 6 this time), each 
> a 18G 15K RPM disk spindle.
> i got tired of scanning the entire available space on an 18G disk so just 
> dropped the test down to the first 2G of each disk.

is any of the disks mounted?

> 
> O_DIRECT is still a ~30% performance hit versus just talking to the 
> /dev/sdX device directly.  profile traces at bottom.
> 
> normal block-device disks sd[m-r] without O_DIRECT, 64K x 8k reads:
>         [root@mel-stglab-host1 src]# readprofile -r; 
> ./test_disk_performance blocks=64K bs=8k /dev/sd[m-r]
>         Completed reading 12000 mbytes in 125.028612 seconds (95.98 
> Mbytes/sec), 76usec mean

can you post your test_disk_performance program so I in particular we
can see the semantics of blocks and bs? 64k*8k == 5k * 1M / 10.

> 
> normal block-device disks sd[m-r] with O_DIRECT, 5K x 1 megabyte reads:
>         [root@mel-stglab-host1 src]# readprofile -r; 
> ./test_disk_performance blocks=5K bs=1m direct /dev/sd[m-r]
>         Completed reading 12000 mbytes in 182.492975 seconds (65.76 
> Mbytes/sec), 15416usec mean
> 
> for interests-sake, compare this to using the 'raw' versions of the same 
> disks:
>         [root@mel-stglab-host1 src]# readprofile -r; 
> ./test_disk_performance blocks=5K bs=1m /dev/raw/raw[2-7]
>         Completed reading 12000 mbytes in 206.346371 seconds (58.15 
> Mbytes/sec), 16860usec mean

O_DIRECT has to do some more work to check for the coherency with the
pagecache and it has some more overhead with the address space
operations, but O_DIRECT by default uses the blocksize of the blkdev,
that is set to 1k by default (if you never mounted it) versus the
hardblocksize of 512bytes used by the raw device (assuming the sd[m-r]
aren't mounted).

This is most probably why O_DIRECT is faster than raw.c, otherwise they
would run almost at the same rate, the pagecache coherency fast paths
and the address space ops overhead of O_DIRECT shouldn't be noticeable.

> 
> of course, these are all ~25% worse than if a mechanism of performing the 
> i/o avoiding the copy_to_user() altogether:
>         [root@mel-stglab-host1 src]# readprofile -r; 
> ./test_disk_performance blocks=64K bs=8k nocopy /dev/sd[m-r]
>         Completed reading 12000 mbytes in 97.846938 seconds (122.64 
> Mbytes/sec), 59usec mean

the nocopy hack is not an interesting test for O_DIRECT/rawio, it
doesn't walk pagetables, it doesn't allow the DMA to be done into
userspace memory. If you want the pagecache to be visible into userspace
(i.e. MAP_PRIVATE/MAP_SHARED) you must deal with pagetables somehow,
and if you want the read/write syscalls to DMA directly into userspace
memory (raw/O_DIRECT) you must still walk pagetables during those
syscalls before starting the DMA. If you don't want to explicitly deal
with the pagetables then you need to copy_user (case 1). In most archs
where mem bandwith is very expensive avoiding the copy-user is a big
global win (other cpus won't collapse in smp etc..).

Your nocopy hack benchmark has some relevance only for usages of the
data done by kernel. So if it is the kernel that reads the data directly
from pagecache (i.e.  a kernel module), then your nocopy benchmark
matters. For example your nocopy benchmark also matters for sendfile
zerocopy, it will read at 122M/sec. But if it's userspace supposed to
receive the data (so not directly from pagecache on the kernel direct
mapping, but in userspace mapped memory) it cannot be 122M/sec, it has
to be less due the user address space management.

> 
> 
> anyone want to see any other benchmarks performed?  would a comparison to 
> 2.5.x be useful?
> 
> 
> comparative profile=2 traces:
>  - no O_DIRECT:
>         [root@mel-stglab-host1 src]# readprofile -v | sort -n -k3 | tail -10
>         80125060 _spin_lock_                                 718   6.4107
>         8013bfc0 brw_kiovec                                  798   0.9591
>         801cbb40 generic_make_request                        830   2.8819
>         801f9400 scsi_init_io_vc                             831   2.2582
>         8013c840 try_to_free_buffers                        1198   3.4034
>         8013a190 end_buffer_io_async                        2453  12.7760
>         8012b100 file_read_actor                            3459  36.0312
>         801cb4e0 __make_request                             7532   4.6152
>         80105220 default_idle                             106468 1663.5625
>         00000000 total                                    134102   0.0726
> 
>  - O_DIRECT, disks /dev/sd[m-r]:
>         [root@mel-stglab-host1 src]# readprofile -v | sort -n -k3 | tail -10
>         801cbb40 generic_make_request                         72   0.2500
>         8013ab00 set_bh_page                                  73   1.1406
>         801cbc60 submit_bh                                   116   1.0357
>         801f72a0 __scsi_end_request                          133   0.4618
>         80139540 unlock_buffer                               139   1.7375
>         8013bf10 end_buffer_io_kiobuf                        302   4.7188
>         8013bfc0 brw_kiovec                                  357   0.4291
>         801cb4e0 __make_request                              995   0.6097
>         80105220 default_idle                              34243 535.0469
>         00000000 total                                     37101   0.0201
> 
>  - /dev/raw/raw[2-7]:
>         [root@mel-stglab-host1 src]# readprofile -v | sort -n -k3 | tail -10
>         8013bf50 wait_kio                                    349   3.1161
>         801cbb40 generic_make_request                        461   1.6007
>         801cbc60 submit_bh                                   526   4.6964
>         80139540 unlock_buffer                               666   8.3250
>         801f72a0 __scsi_end_request                          699   2.4271
>         8013bf10 end_buffer_io_kiobuf                       1672  26.1250
>         8013bfc0 brw_kiovec                                 1906   2.2909
>         801cb4e0 __make_request                            10495   6.4308
>         80105220 default_idle                              84418 1319.0312
>         00000000 total                                    103516   0.0560
> 
>  - O_NOCOPY hack: (userspace doesn't actually get the read data)
>         801f9400 scsi_init_io_vc                             785   2.1332
>         8013c840 try_to_free_buffers                         950   2.6989
>         801f72a0 __scsi_end_request                          966   3.3542
>         801cbb40 generic_make_request                       1017   3.5312
>         8013bf10 end_buffer_io_kiobuf                       1672  26.1250
>         8013a190 end_buffer_io_async                        1693   8.8177
>         8013bfc0 brw_kiovec                                 1906   2.2909
>         801cb4e0 __make_request                            13682   8.3836
>         80105220 default_idle                             112345 1755.3906
>         00000000 total                                    144891   0.0784

Can you use -k4? this is the number of hits per function, but we should
take the size of the function into account too. Otherwise small
functions won't show up.

Can you also give a spin to the same benchmark with 2.4.19pre8aa2? It
has the vary-io stuff from Badari and futher kiobuf optimization from
Chuck. (vary-io will work only with aic and qlogic, enabling it is a one
liner if the driver is just ok with variable bh->b_size in the same I/O
request). right fix for avoiding the flood of small bh is bio in 2.5,
for 2.4 vary-io should be fine.

thanks,

Andrea

  reply	other threads:[~2002-05-10 12:35 UTC|newest]

Thread overview: 220+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2002-05-06  3:53 Linux-2.5.14 Linus Torvalds
2002-05-06  6:30 ` Linux-2.5.14 Daniel Pittman
2002-05-06  6:51   ` Linux-2.5.14 Andrew Morton
2002-05-06 15:13   ` Linux-2.5.14 Linus Torvalds
2002-05-07  4:28     ` Linux-2.5.14 Daniel Pittman
2002-05-09  3:53     ` Linux-2.5.14 Daniel Pittman
2002-05-09  4:34       ` Linux-2.5.14 Andrew Morton
2002-05-09  6:02         ` Linux-2.5.14 Daniel Pittman
2002-05-06  6:47 ` Linux-2.5.14 bert hubert
2002-05-06  7:07   ` Linux-2.5.14 Andrew Morton
2002-05-06 14:00     ` Linux-2.5.14 Rik van Riel
2002-05-06  9:09 ` [PATCH] 2.5.14 IDE 55 Martin Dalecki
2002-05-06 17:48   ` David Lang
2002-05-06 22:40   ` Roman Zippel
2002-05-07 10:10     ` Martin Dalecki
2002-05-07 11:31       ` Roman Zippel
2002-05-07 10:31         ` Martin Dalecki
2002-05-07 10:34           ` Martin Dalecki
2002-05-07 11:48             ` Roman Zippel
2002-05-07 11:19               ` Martin Dalecki
2002-05-07 12:35                 ` Roman Zippel
2002-05-07 12:36                 ` Andrey Panin
2002-05-07 11:32                   ` Martin Dalecki
2002-05-07 12:38                 ` Dave Jones
2002-05-07  0:03   ` Roman Zippel
2002-05-07 10:12     ` Martin Dalecki
2002-05-07 11:39       ` Roman Zippel
2002-05-07 10:40         ` Martin Dalecki
2002-05-07 12:42           ` Roman Zippel
2002-05-07 11:22 ` [PATCH] 2.5.14 IDE 56 Martin Dalecki
2002-05-07 14:02   ` Padraig Brady
2002-05-07 13:15     ` Martin Dalecki
2002-05-07 14:30       ` Padraig Brady
2002-05-07 15:08       ` Anton Altaparmakov
2002-05-07 15:36         ` Linus Torvalds
2002-05-07 16:20           ` Jan Harkes
2002-05-07 15:26             ` Martin Dalecki
2002-05-07 21:36               ` Jan Harkes
2002-05-08  0:25                 ` Guest section DW
2002-05-08  3:03                   ` Jan Harkes
2002-05-08  9:03                   ` Martin Dalecki
2002-05-08 12:10                     ` Alan Cox
2002-05-08 10:51                       ` Martin Dalecki
2002-05-07 16:29           ` Padraig Brady
2002-05-07 16:51             ` Linus Torvalds
2002-05-07 18:29               ` Kai Henningsen
2002-05-08  7:48               ` Juan Quintela
2002-05-08 16:54                 ` Linus Torvalds
2002-05-07 17:08             ` Alan Cox
2002-05-07 17:00               ` Linus Torvalds
2002-05-07 17:19                 ` benh
2002-05-07 17:24                   ` Linus Torvalds
2002-05-07 17:30                     ` benh
2002-05-10  1:45                       ` Mike Fedyk
2002-05-07 17:43                     ` Richard Gooch
2002-05-07 18:05                       ` Linus Torvalds
2002-05-07 18:26                         ` Alan Cox
2002-05-07 18:16                           ` Linus Torvalds
2002-05-07 18:40                             ` Richard Gooch
2002-05-07 18:46                               ` Linus Torvalds
2002-05-07 23:54                                 ` Roman Zippel
2002-05-08  6:57                                 ` Kai Henningsen
2002-05-08  9:37                                   ` Ian Molton
2002-05-09 13:58                                 ` Pavel Machek
2002-05-08  8:21                               ` Martin Dalecki
2002-05-07 17:27                   ` Jauder Ho
2002-05-08  8:13                     ` Martin Dalecki
2002-05-07 18:29                   ` Patrick Mochel
2002-05-07 18:02                     ` Greg KH
2002-05-07 18:44                     ` Richard Gooch
2002-05-07 18:44                       ` Patrick Mochel
2002-05-07 19:21                         ` Richard Gooch
2002-05-07 19:58                           ` Patrick Mochel
2002-05-07 18:49                     ` Thunder from the hill
2002-05-07 19:47                       ` Patrick Mochel
2002-05-07 22:03                         ` Richard Gooch
2002-05-08  8:14                           ` Russell King
2002-05-08 16:07                             ` Richard Gooch
2002-05-08 17:07                               ` Russell King
2002-05-08  8:18                     ` Martin Dalecki
2002-05-08  8:07                   ` Martin Dalecki
2002-05-08  7:58                 ` Martin Dalecki
2002-05-08 12:18                   ` Alan Cox
2002-05-08 11:09                     ` Martin Dalecki
2002-05-08 12:42                       ` Alan Cox
2002-05-08 11:23                         ` Martin Dalecki
2002-05-09  2:37                         ` Lincoln Dale
2002-05-09  3:10                           ` Andrew Morton
2002-05-09 10:05                             ` Lincoln Dale
2002-05-09 18:50                               ` Andrew Morton
2002-05-10  0:33                                 ` Andi Kleen
2002-05-10  0:48                                   ` Andrew Morton
2002-05-10  1:06                                     ` Andi Kleen
2002-05-13 17:51                                       ` Pavel Machek
2002-05-14 21:44                                         ` Andi Kleen
2002-05-10  6:50                                 ` O_DIRECT performance impact on 2.4.18 (was: Re: [PATCH] 2.5.14 IDE 56) Lincoln Dale
2002-05-10  7:15                                   ` O_DIRECT performance impact on 2.4.18 (was: Re: [PATCH] 2.5.14IDE 56) Andrew Morton
2002-05-10  7:21                                     ` Jens Axboe
2002-05-10  8:12                                     ` Andrea Arcangeli
2002-05-10 10:14                                     ` Lincoln Dale
2002-05-10 12:36                                       ` Andrea Arcangeli [this message]
2002-05-11  3:23                                         ` Lincoln Dale
2002-05-13 11:19                                           ` Andrea Arcangeli
2002-05-13 23:58                                             ` Lincoln Dale
2002-05-14  0:22                                               ` Andrea Arcangeli
2002-05-14  2:43                                                 ` O_DIRECT on 2.4.19pre8aa2 md device Lincoln Dale
2002-05-21 15:51                                                   ` Andrea Arcangeli
2002-05-22  1:18                                                     ` Lincoln Dale
2002-05-22  2:51                                                       ` Andrea Arcangeli
2002-06-03  4:53                                                         ` high-end i/o performance of 2.4.19pre8aa2 (was: Re: O_DIRECT on 2.4.19pre8aa2 device) Lincoln Dale
2002-05-12 11:23                                         ` O_DIRECT performance impact on 2.4.18 (was: Re: [PATCH] 2.5.14IDE 56) Lincoln Dale
2002-05-13 11:37                                           ` Andrea Arcangeli
2002-05-10 15:55                                   ` O_DIRECT performance impact on 2.4.18 (was: Re: [PATCH] 2.5.14 IDE 56) Linus Torvalds
2002-05-11  1:01                                     ` Gerrit Huizenga
2002-05-11 18:04                                       ` Linus Torvalds
2002-05-11 18:19                                         ` Larry McVoy
2002-05-11 18:35                                           ` Linus Torvalds
2002-05-11 18:37                                             ` Larry McVoy
2002-05-11 18:56                                               ` Linus Torvalds
2002-05-11 21:42                                                 ` Gerrit Huizenga
2002-05-11 18:43                                             ` Mr. James W. Laferriere
2002-05-11 23:38                                             ` Lincoln Dale
2002-05-12  0:36                                               ` yodaiken
2002-05-12  2:40                                                 ` Andrew Morton
2002-05-11 18:26                                         ` O_DIRECT performance impact on 2.4.18 (was: Re: [PATCH] 2.5.14 Alan Cox
2002-05-11 18:09                                           ` Linus Torvalds
2002-05-11 18:45                                         ` O_DIRECT performance impact on 2.4.18 (was: Re: [PATCH] 2.5.14 IDE 56) yodaiken
2002-05-11 19:55                                         ` O_DIRECT performance impact on 2.4.18 Bernd Eckenfels
2002-05-11 14:18                                     ` O_DIRECT performance impact on 2.4.18 (was: Re: [PATCH] 2.5.14 IDE 56) Roy Sigurd Karlsbakk
2002-05-11 14:24                                       ` Jens Axboe
2002-05-11 18:25                                         ` Gerrit Huizenga
2002-05-11 20:17                                           ` Jens Axboe
2002-05-11 22:27                                             ` Gerrit Huizenga
2002-05-11 23:17                                       ` Lincoln Dale
2002-05-09  4:16                           ` [PATCH] 2.5.14 IDE 56 Andre Hedrick
2002-05-09 13:32                             ` Alan Cox
2002-05-09 14:58                           ` Alan Cox
2002-05-08 18:21                     ` Erik Andersen
2002-05-08 18:59                       ` Dave Jones
2002-05-08 19:31                       ` Alan Cox
2002-05-08 21:16                         ` Erik Andersen
2002-05-08 22:14                           ` Alan Cox
2002-05-09 13:13                   ` Pavel Machek
2002-05-09 19:22                     ` Daniel Jacobowitz
2002-05-10 12:01                   ` Padraig Brady
2002-05-09 13:18                 ` Pavel Machek
2002-05-07 17:10             ` Richard B. Johnson
2002-05-08  7:36             ` Martin Dalecki
2002-05-08 17:22               ` Greg KH
2002-05-08 18:46   ` Denis Vlasenko
2002-05-07 11:27 ` [PATCH] 2.5.14 IDE 57 Martin Dalecki
2002-05-07 13:16   ` Anton Altaparmakov
2002-05-07 12:34     ` Martin Dalecki
2002-05-07 13:56       ` Mikael Pettersson
2002-05-07 14:04         ` Dave Jones
2002-05-07 13:57       ` Anton Altaparmakov
2002-05-07 14:08         ` Dave Jones
2002-05-07 13:11           ` Martin Dalecki
2002-05-07 14:29           ` Anton Altaparmakov
2002-05-07 13:36             ` Martin Dalecki
2002-05-07 15:08               ` Anton Altaparmakov
2002-05-07 16:51             ` Dave Jones
2002-05-08  3:38               ` Anton Altaparmakov
2002-05-08 11:47                 ` Dave Jones
2002-05-07 15:07           ` Padraig Brady
2002-05-07 17:21           ` Andre Hedrick
2002-05-11 14:09   ` Aaron Lehmann
2002-05-07 15:03 ` [PATCH] IDE 58 Martin Dalecki
2002-05-08  6:42   ` Paul Mackerras
2002-05-08  8:53     ` Martin Dalecki
2002-05-08 10:37       ` Bjorn Wesen
2002-05-08 10:16         ` Martin Dalecki
2002-05-08 19:06           ` Linus Torvalds
2002-05-08 19:10             ` Benjamin Herrenschmidt
2002-05-08 20:31               ` Alan Cox
2002-05-08 19:49                 ` Benjamin Herrenschmidt
2002-05-08 20:44                   ` Alan Cox
2002-05-08 20:04                     ` Benjamin Herrenschmidt
2002-05-09 20:20                       ` Ian Molton
2002-05-08 20:36                     ` Andre Hedrick
2002-05-08 20:29                 ` Andre Hedrick
2002-05-08 20:06                   ` Benjamin Herrenschmidt
2002-05-09 12:14                   ` Martin Dalecki
2002-05-09 15:19               ` Eric W. Biederman
2002-05-09 20:20               ` Ian Molton
2002-05-08 11:00         ` Benjamin Herrenschmidt
2002-05-09 19:58 ` [PATCH] 2.5.14 IDE 59 Martin Dalecki
2002-05-11  4:16   ` William Lee Irwin III
2002-05-11 16:59 ` [PATCH] 2.5.15 IDE 60 Martin Dalecki
2002-05-11 18:47   ` Pierre Rousselet
2002-05-11 19:12     ` Andre Hedrick
2002-05-11 19:52       ` Pierre Rousselet
2002-05-11 23:48         ` Andre Hedrick
2002-05-12 19:19   ` pdc202xx.c fails to compile in 2.5.15 Zlatko Calusic
2002-05-12 19:40     ` Jurriaan on Alpha
2002-05-12 22:00     ` Petr Vandrovec
2002-05-13 12:03       ` Alan Cox
2002-05-13  9:48 ` [PATCH] 2.5.15 IDE 61 Martin Dalecki
2002-05-13 12:17 ` [PATCH] 2.5.15 IDE 62 Martin Dalecki
2002-05-13 13:48   ` Jens Axboe
2002-05-13 13:02     ` Martin Dalecki
2002-05-13 15:38       ` Jens Axboe
2002-05-13 15:45         ` Martin Dalecki
2002-05-13 16:54           ` Linus Torvalds
2002-05-13 16:55             ` Jens Axboe
2002-05-13 16:00               ` Martin Dalecki
2002-05-13 18:02             ` benh
2002-05-13 15:50         ` Martin Dalecki
2002-05-13 17:52         ` benh
2002-05-13 15:55           ` Martin Dalecki
2002-05-13 19:13             ` benh
2002-05-14  8:48               ` Martin Dalecki
2002-05-17 11:40               ` Martin Dalecki
2002-05-17  2:27                 ` Benjamin Herrenschmidt
2002-05-13 15:36   ` Tom Rini
2002-05-14 10:26 ` [PATCH] 2.5.15 IDE 62a Martin Dalecki
2002-05-14 10:28 ` [PATCH] 2.5.15 IDE 63 Martin Dalecki
2002-05-15 12:04 ` [PATCH] 2.5.15 IDE 64 Martin Dalecki
2002-05-15 13:12   ` Russell King
2002-05-15 12:14     ` Martin Dalecki

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20020510143616.C13730@dualathlon.random \
    --to=andrea@suse.de \
    --cc=akpm@zip.com.au \
    --cc=linux-kernel@vger.kernel.org \
    --cc=ltd@cisco.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox