public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* mmap vs. O_DIRECT
@ 2004-11-10  0:05 Bill Davidsen
  2004-11-10 21:13 ` Robert Love
  2004-11-10 22:19 ` Alan Cox
  0 siblings, 2 replies; 8+ messages in thread
From: Bill Davidsen @ 2004-11-10  0:05 UTC (permalink / raw)
  To: linux-kernel

I have an application which does a lot of mmap to process its data. The 
huge waitio time makes me think that mmap isn't doing direct i/o even 
when things are alligned. Before I start poking the code, is there a 
reason why direct is not default for i/o in page-size transfers on page 
size file offsets? I don't have source code, but the parameters of the 
mmap all seem to satisfy the allignment requirements.

I realize there may be a reason for forcing the i/o through kernel 
buffers, or for not taking advantage of doing direct i/o whenever 
possible, it just doesn't jump out at me.

-- 
    -bill davidsen (davidsen@tmr.com)
"The secret to procrastination is to put things off until the
  last possible moment - but no longer"  -me

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: mmap vs. O_DIRECT
  2004-11-10  0:05 mmap vs. O_DIRECT Bill Davidsen
@ 2004-11-10 21:13 ` Robert Love
  2004-11-11 14:50   ` Bill Davidsen
  2004-11-10 22:19 ` Alan Cox
  1 sibling, 1 reply; 8+ messages in thread
From: Robert Love @ 2004-11-10 21:13 UTC (permalink / raw)
  To: Bill Davidsen; +Cc: linux-kernel

On Tue, 2004-11-09 at 19:05 -0500, Bill Davidsen wrote:
> I have an application which does a lot of mmap to process its data. The 
> huge waitio time makes me think that mmap isn't doing direct i/o even 
> when things are alligned. Before I start poking the code, is there a 
> reason why direct is not default for i/o in page-size transfers on page 
> size file offsets? I don't have source code, but the parameters of the 
> mmap all seem to satisfy the allignment requirements.
> 
> I realize there may be a reason for forcing the i/o through kernel 
> buffers, or for not taking advantage of doing direct i/o whenever 
> possible, it just doesn't jump out at me.

Direct I/O (O_DIRECT) will almost assuredly increase I/O wait and
degrade I/O performance, not improve it.

I don't think direct I/O is what you want and I am sure that we don't
want aligned mmaps to not go through the page cache and be synchronous.

	Robert Love



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: mmap vs. O_DIRECT
  2004-11-10  0:05 mmap vs. O_DIRECT Bill Davidsen
  2004-11-10 21:13 ` Robert Love
@ 2004-11-10 22:19 ` Alan Cox
  1 sibling, 0 replies; 8+ messages in thread
From: Alan Cox @ 2004-11-10 22:19 UTC (permalink / raw)
  To: Bill Davidsen; +Cc: Linux Kernel Mailing List

On Mer, 2004-11-10 at 00:05, Bill Davidsen wrote:
> I have an application which does a lot of mmap to process its data. The 
> huge waitio time makes me think that mmap isn't doing direct i/o even 
> when things are alligned.

Make sure you are using MAP_SHARED in such cases so that the object you
have is the page cache object, also remember to use madvise


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: mmap vs. O_DIRECT
  2004-11-10 21:13 ` Robert Love
@ 2004-11-11 14:50   ` Bill Davidsen
  2004-11-11 15:41     ` Robert Love
  0 siblings, 1 reply; 8+ messages in thread
From: Bill Davidsen @ 2004-11-11 14:50 UTC (permalink / raw)
  To: Robert Love; +Cc: linux-kernel

Robert Love wrote:
> On Tue, 2004-11-09 at 19:05 -0500, Bill Davidsen wrote:
> 
>>I have an application which does a lot of mmap to process its data. The 
>>huge waitio time makes me think that mmap isn't doing direct i/o even 
>>when things are alligned. Before I start poking the code, is there a 
>>reason why direct is not default for i/o in page-size transfers on page 
>>size file offsets? I don't have source code, but the parameters of the 
>>mmap all seem to satisfy the allignment requirements.
>>
>>I realize there may be a reason for forcing the i/o through kernel 
>>buffers, or for not taking advantage of doing direct i/o whenever 
>>possible, it just doesn't jump out at me.
> 
> 
> Direct I/O (O_DIRECT) will almost assuredly increase I/O wait and
> degrade I/O performance, not improve it.
> 
Sorry, I have to totally disagree, based on a year's experience with 30+ 
  usenet servers which can be run with or without direct. Without direct 
the data for every access is copied through the system buffers before 
reaching the user program. By using O_DIRECT the waitio time reported 
dropped (400-500 users/server) from 40+% to about 14%.

Since the same volume of data and the same number of i/o are being done, 
I can't see how doing an extra copy could possibly do anything good!

> I don't think direct I/O is what you want and I am sure that we don't
> want aligned mmaps to not go through the page cache and be synchronous.

Having seen the results in actual experience using seek/read access, I 
am interested in getting the same benefits from the application using 
mmap, preferably without rewriting the application to use direct access 
explicitly.

I miss your point about synchronous, with hundreds of clients doing 
small reads against a 10TB database, the benefit of pushing them through 
the page cache isn't obvious. No particular data are in memory long 
enough to have much chance of being shared, so it looks like overhead to 
me. Feel free to educate me.

I certainly DO want to put more users per server, and direct I/O has 
proven itself in actual use. I'm not sure why you think the double copy 
is a good thing, but I have good rea$on to want more users per server.

Alan: point on MAP_SHARED taken.

-- 
bill davidsen <davidsen@tmr.com>
   CTO TMR Associates, Inc
   Doing interesting things with small computers since 1979

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: mmap vs. O_DIRECT
  2004-11-11 14:50   ` Bill Davidsen
@ 2004-11-11 15:41     ` Robert Love
  2004-11-11 17:13       ` Robert Love
  0 siblings, 1 reply; 8+ messages in thread
From: Robert Love @ 2004-11-11 15:41 UTC (permalink / raw)
  To: Bill Davidsen; +Cc: linux-kernel

On Thu, 2004-11-11 at 09:50 -0500, Bill Davidsen wrote:

> I miss your point about synchronous, with hundreds of clients doing 
> small reads against a 10TB database, the benefit of pushing them through 
> the page cache isn't obvious. No particular data are in memory long 
> enough to have much chance of being shared, so it looks like overhead to 
> me. Feel free to educate me.

There is a difference between being synchronous and not going through
the page cache, although in Linux we don't really have the distinction.

> I certainly DO want to put more users per server, and direct I/O has 
> proven itself in actual use. I'm not sure why you think the double copy 
> is a good thing, but I have good rea$on to want more users per server.
> 
> Alan: point on MAP_SHARED taken.

BTW, Alan's point on MAP_SHARED is just that you can have the mmap
region and the page cached region be one and the same.  You still aren't
doing direct I/O.

Maybe that is ultimately what you want.

It is rare to see direct I/O perform better when you use it as normal
file I/O (e.g. don't perform your own caching and scheduling) but if you
really do measure improvements, and if you never reaccess the data (and
thus the lack of cache is not a problem), then by all means use it.

But we still don't want to make normal mmap's be direct.

	Robert Love



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: mmap vs. O_DIRECT
  2004-11-11 15:41     ` Robert Love
@ 2004-11-11 17:13       ` Robert Love
  2004-11-11 17:19         ` Avi Kivity
  0 siblings, 1 reply; 8+ messages in thread
From: Robert Love @ 2004-11-11 17:13 UTC (permalink / raw)
  To: Bill Davidsen; +Cc: linux-kernel

On Thu, 2004-11-11 at 10:41 -0500, Robert Love wrote:

> There is a difference between being synchronous and not going through
> the page cache, although in Linux we don't really have the distinction.

Rereading this, I should clarify.  We definitely have the distinction.

In the case of direct I/O, you get synchronousness, no page caching, and
no use of buffers.  In my statement, I meant that you cannot separate
the "no page cache" from the "synchronousness" attribute.

But you can get synchronous I/O and still get the page cache, ala
O_SYNC.

The closest you can come to normal I/O without the page cache is by
doing posix_fadvise() to prune your cache pages after you touch them.
That is definitely not what you want.

	Robert Love



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: mmap vs. O_DIRECT
  2004-11-11 17:13       ` Robert Love
@ 2004-11-11 17:19         ` Avi Kivity
  2004-11-11 17:22           ` Robert Love
  0 siblings, 1 reply; 8+ messages in thread
From: Avi Kivity @ 2004-11-11 17:19 UTC (permalink / raw)
  To: Robert Love; +Cc: Bill Davidsen, linux-kernel

Robert Love wrote:

>The closest you can come to normal I/O without the page cache is by
>doing posix_fadvise() to prune your cache pages after you touch them.
>  
>
Or you can use aio with O_DIRECT.

-- 
Do not meddle in the internals of kernels, for they are subtle and quick to panic.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: mmap vs. O_DIRECT
  2004-11-11 17:19         ` Avi Kivity
@ 2004-11-11 17:22           ` Robert Love
  0 siblings, 0 replies; 8+ messages in thread
From: Robert Love @ 2004-11-11 17:22 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Bill Davidsen, linux-kernel

On Thu, 2004-11-11 at 19:19 +0200, Avi Kivity wrote:

> Or you can use aio with O_DIRECT.

Ah, indeed.  I was thinking from the kernel's perspective.

	Robert Love



^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2004-11-11 18:53 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-11-10  0:05 mmap vs. O_DIRECT Bill Davidsen
2004-11-10 21:13 ` Robert Love
2004-11-11 14:50   ` Bill Davidsen
2004-11-11 15:41     ` Robert Love
2004-11-11 17:13       ` Robert Love
2004-11-11 17:19         ` Avi Kivity
2004-11-11 17:22           ` Robert Love
2004-11-10 22:19 ` Alan Cox

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox