From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id ; Sat, 11 May 2002 14:19:44 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id ; Sat, 11 May 2002 14:19:44 -0400 Received: from bitmover.com ([192.132.92.2]:23718 "EHLO bitmover.com") by vger.kernel.org with ESMTP id ; Sat, 11 May 2002 14:19:42 -0400 Date: Sat, 11 May 2002 11:19:35 -0700 From: Larry McVoy To: Linus Torvalds Cc: Gerrit Huizenga , Lincoln Dale , Andrew Morton , Alan Cox , Martin Dalecki , Padraig Brady , Anton Altaparmakov , Kernel Mailing List Subject: Re: O_DIRECT performance impact on 2.4.18 (was: Re: [PATCH] 2.5.14 IDE 56) Message-ID: <20020511111935.B30126@work.bitmover.com> Mail-Followup-To: Larry McVoy , Linus Torvalds , Gerrit Huizenga , Lincoln Dale , Andrew Morton , Alan Cox , Martin Dalecki , Padraig Brady , Anton Altaparmakov , Kernel Mailing List In-Reply-To: Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5.1i Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org On Sat, May 11, 2002 at 11:04:45AM -0700, Linus Torvalds wrote: > The thing that has always disturbed me about O_DIRECT is that the whole > interface is just stupid, and was probably designed by a deranged monkey > on some serious mind-controlling substances [*]. > > I bet you could get _better_ performance more cleanly by splitting up the > actual IO generation and the "user-space mapping" thing sanely. For > example, if you want to do an O_DIRECT read into a buffer, there is no > reason why it shouldn't be done in two phases: You're only halfway right. You want to avoid the mmap altogether. To see why, postulate that you have infinitely fast I/O devices (I know that's not true but it's close enough if you get enough DMA channels going at once, it doesn't take very many to saturate memory). For any server application, now all your time is in the mmap(). And there is no need for it in general, it's just there because the upper layer of the system is too lame to handle real page frames. Go read the splice notes, ftp://bitmover.com/pub/splice.ps because those were written after we had tuned things enough in IRIX that it was the VM manipulations that became the bottleneck. Another way to think of it is this: figure out how fast the hardware could move the data. Now make it go that fast. Unless you can hide all the VM crud somehow, you won't achieve 100% of the hardware's capability. I know I've done a bad job explaining the splice crud, but there is some pretty cool stuff in there, if you really got it, you'd see how the server stuff, the database stuff, the aio stuff, all I/O of any kind can be done in terms of the splice:pull() and splice:push() interfaces and that it is the absolute lowest cost way to have a generic I/O layer. -- --- Larry McVoy lm at bitmover.com http://www.bitmover.com/lm