public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Chris Mason <chris.mason@oracle.com>
To: Linus Torvalds <torvalds@osdl.org>
Cc: dean gaudet <dean@arctic.org>, Viktor <vvp01@inbox.ru>,
	Aubrey <aubreylee@gmail.com>, Hua Zhong <hzhong@gmail.com>,
	Hugh Dickins <hugh@veritas.com>,
	linux-kernel@vger.kernel.org, hch@infradead.org,
	kenneth.w.chen@intel.com, akpm@osdl.org, mjt@tls.msk.ru
Subject: Re: O_DIRECT question
Date: Fri, 12 Jan 2007 15:23:16 -0500	[thread overview]
Message-ID: <20070112202316.GA28400@think.oraclecorp.com> (raw)
In-Reply-To: <Pine.LNX.4.64.0701120955440.3594@woody.osdl.org>

On Fri, Jan 12, 2007 at 10:06:22AM -0800, Linus Torvalds wrote:

> > looking at the splice(2) api it seems like it'll be difficult to implement 
> > O_DIRECT pread/pwrite from userland using splice... so there'd need to be 
> > some help there.
> 
> You'd use vmsplice() to put the write buffers into kernel space (user 
> space sees it's a pipe file descriptor, but you should just ignore that: 
> it's really just a kernel buffer). And then splice the resulting kernel 
> buffers to the destination.

I recently spent some time trying to integrate O_DIRECT locking with
page cache locking.  The basic theory is that instead of using
semaphores for solving O_DIRECT vs buffered races, you put something
into the radix tree (I call it a placeholder) to keep the page cache
users out, and lock any existing pages that are present.

O_DIRECT does save cpu from avoiding copies, but it also saves cpu from
fewer radix tree operations during massive IOs.  The cost of radix tree
insertion/deletion on 1MB O_DIRECT ios added ~10% system time on
my tiny little dual core box.  I'm sure it would be much worse if there
was lock contention on a big numa machine, and it grows as the io grows
(SGI does massive O_DIRECT ios).

To help reduce radix churn, I made it possible for a single placeholder
entry to lock down a range in the radix:

http://thread.gmane.org/gmane.linux.file-systems/12263

It looks to me as though vmsplice is going to have the same issues as my
early patches.  The current splice code can avoid the copy but is still
working in page sized chunks.  Also, splice doesn't support zero copy on
things smaller than page sized chunks.

The compromise my patch makes is to hide placeholders from almost
everything except the DIO code.  It may be worthwhile to turn the
placeholders into an IO marker that can be useful to filemap_fdatawrite
and friends.

It should be able to:

record the userland/kernel pages involved in a given io
map blocks from the FS for making a bio
start the io
wake people up when the io is done

This would allow splice to operate without stealing the userland page
(stealing would still be an option of course), and could get rid of big
chunks of fs/direct-io.c.

-chris

  reply	other threads:[~2007-01-12 20:27 UTC|newest]

Thread overview: 130+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-01-11  2:57 O_DIRECT question Aubrey
2007-01-11  3:05 ` Linus Torvalds
2007-01-11  3:15   ` Linus Torvalds
2007-01-11  6:09     ` Nick Piggin
2007-01-11 15:50       ` Linus Torvalds
2007-01-11 16:19         ` Aubrey
2007-01-16  3:41           ` Jörn Engel
2007-01-11 16:23         ` bert hubert
2007-01-11 16:52         ` Xavier Bestel
2007-01-11 17:04           ` Linus Torvalds
2007-01-11 18:41             ` Trond Myklebust
2007-01-11 19:00               ` Linus Torvalds
2007-01-11 19:49                 ` Trond Myklebust
2007-01-12 17:03             ` Viktor
2007-01-20 16:19         ` Denis Vlasenko
2007-01-22 15:52           ` Phillip Susi
2007-01-11  5:50   ` Aubrey
2007-01-11  6:06     ` Andrew Morton
2007-01-11  6:45       ` Aubrey
2007-01-11  6:57         ` Andrew Morton
2007-01-11  7:05           ` Nick Piggin
2007-01-11  7:54             ` Aubrey
2007-01-11  8:05               ` Roy Huang
2007-01-11 16:45                 ` Linus Torvalds
2007-01-17  4:29                   ` Aubrey Li
2007-01-12  2:12                 ` Aubrey
2007-01-12  2:47                   ` Nick Piggin
2007-01-12  3:59                   ` Roy Huang
2007-01-11  8:12               ` Nick Piggin
2007-01-11  8:49                 ` Roy Huang
2007-01-11  9:09                   ` Nick Piggin
2007-01-12  2:48                 ` Bill Davidsen
2007-01-12  4:30                   ` Nick Piggin
2007-01-12  4:46                     ` Linus Torvalds
2007-01-12  4:56                       ` Nick Piggin
2007-01-12  4:58                         ` Nick Piggin
2007-01-12  5:18                         ` Linus Torvalds
2007-01-12  5:22                         ` Aubrey
2007-01-12 14:59                           ` Bill Davidsen
2007-01-13  4:51                             ` Nick Piggin
2007-01-11  6:16     ` Alexander Shishkin
2007-01-11  6:57       ` Aubrey
2007-01-11 12:13   ` Viktor
2007-01-11 15:53     ` Phillip Susi
2007-01-11 16:20     ` Linus Torvalds
2007-01-11 17:13       ` Michael Tokarev
2007-01-11 23:01         ` Phillip Susi
2007-01-11 23:06           ` Hua Zhong
2007-01-12 15:21             ` Phillip Susi
2007-01-20 16:36         ` Denis Vlasenko
2007-01-20 20:55           ` Michael Tokarev
2007-01-20 23:05             ` Denis Vlasenko
2007-01-21 12:09               ` Michael Tokarev
2007-01-21 20:02                 ` Denis Vlasenko
2007-01-22 16:17                   ` Phillip Susi
2007-01-24 21:15                     ` Denis Vlasenko
2007-01-25 15:44                       ` Phillip Susi
2007-01-25 17:38                         ` Denis Vlasenko
2007-01-25 19:28                           ` Phillip Susi
2007-01-25 19:52                             ` Denis Vlasenko
2007-01-25 20:03                               ` Phillip Susi
2007-01-25 20:45                                 ` Michael Tokarev
2007-01-25 21:11                                   ` Denis Vlasenko
2007-01-26 16:02                                     ` Mark Lord
2007-01-26 16:52                                       ` Viktor
2007-01-26 16:58                                       ` Phillip Susi
2007-01-26 17:05                                     ` Phillip Susi
2007-01-26 23:16                                       ` Denis Vlasenko
2007-02-06 20:39                                         ` Pavel Machek
2007-01-26 18:23                                     ` Bill Davidsen
2007-01-26 23:35                                       ` Denis Vlasenko
2007-01-28 15:18                                         ` Bill Davidsen
2007-01-28 17:03                                           ` Denis Vlasenko
2007-01-29 15:43                                             ` Phillip Susi
2007-01-29 17:00                                             ` Andrea Arcangeli
2007-01-30  0:05                                               ` Denis Vlasenko
     [not found]                                               ` <45BE7D99.70200@cfl.rr.com>
     [not found]                                                 ` <20070130023056.GN8030@opteron.random>
     [not found]                                                   ` <45BF65E3.6070102@cfl.rr.com>
     [not found]                                                     ` <20070130164806.GQ8030@opteron.random>
2007-01-30 18:50                                                       ` Phillip Susi
2007-01-30 19:57                                                         ` Andrea Arcangeli
2007-01-30 20:06                                                           ` Andrea Arcangeli
2007-01-30 23:07                                                           ` Phillip Susi
2007-01-31  2:28                                                             ` Andrea Arcangeli
2007-01-31  9:37                                                             ` Michael Tokarev
2007-01-26 15:53                   ` Bill Davidsen
2007-01-11 17:42       ` Alan
2007-01-11 18:00         ` Linus Torvalds
2007-01-12  7:57       ` dean gaudet
2007-01-12 15:27         ` Phillip Susi
2007-01-12 18:06         ` Linus Torvalds
2007-01-12 20:23           ` Chris Mason [this message]
2007-01-12 20:46             ` Michael Tokarev
2007-01-12 20:52               ` Michael Tokarev
2007-01-12 21:03                 ` Michael Tokarev
2007-01-12 21:17                   ` Linus Torvalds
2007-01-12 21:54                     ` Michael Tokarev
2007-01-12 22:09                       ` Linus Torvalds
2007-01-12 22:26                         ` Michael Tokarev
2007-01-12 22:35                         ` Erik Andersen
2007-01-12 22:47                           ` Andrew Morton
2007-01-14  9:11                             ` Nate Diller
2007-01-20 16:45                               ` Denis Vlasenko
2007-01-22  1:47                             ` Andrea Arcangeli
2007-01-13 20:07                     ` Bill Davidsen
2007-01-13 20:27                       ` Michael Tokarev
2007-01-14 15:39                         ` Bill Davidsen
2007-01-12 21:39                   ` Disk Cache, Was: " Zan Lynx
2007-01-12 22:10                     ` Michael Tokarev
2007-01-15 12:11               ` Helge Hafting
2007-01-12 16:59       ` Viktor
2007-01-11 12:45   ` Erik Mouw
2007-01-11  4:51 ` Andrew Morton
2007-01-11  5:06   ` Gerrit Huizenga
2007-01-11 16:09   ` Badari Pulavarty
2007-01-11 12:34 ` linux-os (Dick Johnson)
2007-01-11 13:06   ` Martin Mares
2007-01-11 14:15   ` Jens Axboe
2007-01-12  2:13   ` Bill Davidsen
  -- strict thread matches above, loose matches on Subject: below --
2007-01-17 14:27 Alex Tomas
2007-01-22 15:59 Al Boldi
     [not found] <7BYkO-5OV-17@gated-at.bofh.it>
     [not found] ` <7BYul-6gz-5@gated-at.bofh.it>
     [not found]   ` <7C18X-1zo-5@gated-at.bofh.it>
     [not found]     ` <7C1iw-22q-7@gated-at.bofh.it>
     [not found]       ` <7C1Vb-2Ny-3@gated-at.bofh.it>
     [not found]         ` <7C256-2ZR-27@gated-at.bofh.it>
     [not found]           ` <7C2eE-3rT-15@gated-at.bofh.it>
     [not found]             ` <7C31d-4qb-11@gated-at.bofh.it>
     [not found]               ` <7C3kj-55E-9@gated-at.bofh.it>
2007-01-11 13:20                 ` Bodo Eggert
     [not found]   ` <7C74B-2A4-23@gated-at.bofh.it>
     [not found]     ` <7CaYA-mT-19@gated-at.bofh.it>
     [not found]       ` <7Cpuz-64X-1@gated-at.bofh.it>
     [not found]         ` <7Cz0T-4PH-17@gated-at.bofh.it>
     [not found]           ` <7CBcl-86B-9@gated-at.bofh.it>
     [not found]             ` <7CBvH-52-9@gated-at.bofh.it>
     [not found]               ` <7CBFn-hw-1@gated-at.bofh.it>
     [not found]                 ` <7CBP1-KI-3@gated-at.bofh.it>
     [not found]                   ` <7CBYG-WK-3@gated-at.bofh.it>
2007-01-13 16:53                     ` Bodo Eggert
2007-01-13 19:30                       ` Bill Davidsen
2007-01-14 18:51                         ` Bodo Eggert
     [not found]                     ` <7CXmz-88G-29@gated-at.bofh.it>
     [not found]                       ` <7CXFR-8vZ-15@gated-at.bofh.it>
     [not found]                         ` <7DfMP-2ak-19@gated-at.bofh.it>
2007-01-14 19:39                           ` Bodo Eggert
     [not found]               ` <7DyYK-6lE-3@gated-at.bofh.it>
2007-01-16 20:26                 ` Bodo Eggert
2007-01-17  5:55                   ` Arjan van de Ven
2007-01-17 22:36                     ` Bodo Eggert
     [not found] ` <7HkaQ-2Nb-9@gated-at.bofh.it>
     [not found]   ` <7HDZP-Pv-1@gated-at.bofh.it>
     [not found]     ` <7HIPV-8kp-35@gated-at.bofh.it>
2007-01-27 14:01       ` Bodo Eggert
2007-01-27 14:14         ` Denis Vlasenko
2007-01-28 15:30           ` Bill Davidsen
2007-01-28 17:18             ` Denis Vlasenko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20070112202316.GA28400@think.oraclecorp.com \
    --to=chris.mason@oracle.com \
    --cc=akpm@osdl.org \
    --cc=aubreylee@gmail.com \
    --cc=dean@arctic.org \
    --cc=hch@infradead.org \
    --cc=hugh@veritas.com \
    --cc=hzhong@gmail.com \
    --cc=kenneth.w.chen@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mjt@tls.msk.ru \
    --cc=torvalds@osdl.org \
    --cc=vvp01@inbox.ru \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox