public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Michael Tokarev <mjt@tls.msk.ru>
To: Linus Torvalds <torvalds@osdl.org>
Cc: Chris Mason <chris.mason@oracle.com>,
	dean gaudet <dean@arctic.org>, Viktor <vvp01@inbox.ru>,
	Aubrey <aubreylee@gmail.com>, Hua Zhong <hzhong@gmail.com>,
	Hugh Dickins <hugh@veritas.com>,
	linux-kernel@vger.kernel.org, hch@infradead.org,
	kenneth.w.chen@intel.com, akpm@osdl.org
Subject: Re: O_DIRECT question
Date: Sat, 13 Jan 2007 01:26:41 +0300	[thread overview]
Message-ID: <45A80B21.6040609@tls.msk.ru> (raw)
In-Reply-To: <Pine.LNX.4.64.0701121705440.3470@woody.osdl.org>

Linus Torvalds wrote:
> 
> On Sat, 13 Jan 2007, Michael Tokarev wrote:
>>> At that point, O_DIRECT would be a way of saying "we're going to do 
>>> uncached accesses to this pre-allocated file". Which is a half-way 
>>> sensible thing to do.
>> Half-way?
> 
> I suspect a lot of people actually have other reasons to avoid caches. 
> 
> For example, the reason to do O_DIRECT may well not be that you want to 
> avoid caching per se, but simply because you want to limit page cache 
> activity. In which case O_DIRECT "works", but it's really the wrong thing 
> to do. We could export other ways to do what people ACTUALLY want, that 
> doesn't have the downsides.
> 
> For example, the page cache is absolutely required if you want to mmap. 
> There's no way you can do O_DIRECT and mmap at the same time and expect 
> any kind of sane behaviour. It may not be what a DB wants to use, but it's 
> an example of where O_DIRECT really falls down.

Provided when the two are about the same part of a file.  If not, and if
the file is "divided" on a proper boundary (sector/page/whatever-aligned),
there's no issues, at least not if all the blocks of a file has been allocated
(no gaps, that is).

What I was referring to in my last email - and said it's a corner case - is:
mmap() start of a file, say, first megabyte of it, where some index/bitmap is
located, and use direct-io on the rest.  So the two aren't overlap.

Still problematic?

>>> But what O_DIRECT does right now is _not_ really sensible, and the 
>>> O_DIRECT propeller-heads seem to have some problem even admitting that 
>>> there _is_ a problem, because they don't care. 
>> Well.  In fact, there's NO problems to admit.
>>
>> Yes, yes, yes yes - when you think about it from a general point of
>> view, and think how non-O_DIRECT and O_DIRECT access fits together,
>> it's a complete mess, and you're 100% right it's a mess.
> 
> You can't admit that even O_DIRECT _without_ any non-O_DIRECT actually 
> fails in many ways right now.
> 
> I've already mentioned ftruncate and block allocation. You don't seem to 
> understand that those are ALSO a problem.

I do understand this.  And this is, too, solved right now in userspace.
For example, when oracle allocates a file for its data, or when it extends
the file, it writes something to every block of new space (using O_DIRECT
while at it, but that's a different story).  The thing is: while it is doing
that, no process tries to do anything with that (part of a) file (not counting
some external processes run by evil hackers ;)  So there's still no races
or fundamental brokeness *in usage*.

It uses ftruncate() to create or extend a file, *and* does O_DIRECT writes
to force block allocations.  That's probably not right, and that alone is
probably difficult to implement in kernel (I just don't know; what I know
for sure is that this way is very slow on ext3).  Maybe because there's no
way to tell kernel something like "set the file size to this and actually
*allocate* space for it" (if it doesn't write some structure to the file).

What I dislike very much is - half-solutions.  And current O_DIRECT indeed
looks like half-a-solution, because sometimes it works, and sometimes, in
*wrong* usage scenario, it doesn't, or racy, etc, and kernel *allows* such
a wrong scenario.  A software should either work correctly, or disallow
a usage where it can't guarantee correctness.  Currently, kernel allows
incorrect usage, and that, plus all the ugly things in code done in attempt
to fix that, suxx.

But the whole thing is not (fundamentally) broken.

/mjt

  reply	other threads:[~2007-01-12 22:26 UTC|newest]

Thread overview: 130+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-01-11  2:57 O_DIRECT question Aubrey
2007-01-11  3:05 ` Linus Torvalds
2007-01-11  3:15   ` Linus Torvalds
2007-01-11  6:09     ` Nick Piggin
2007-01-11 15:50       ` Linus Torvalds
2007-01-11 16:19         ` Aubrey
2007-01-16  3:41           ` Jörn Engel
2007-01-11 16:23         ` bert hubert
2007-01-11 16:52         ` Xavier Bestel
2007-01-11 17:04           ` Linus Torvalds
2007-01-11 18:41             ` Trond Myklebust
2007-01-11 19:00               ` Linus Torvalds
2007-01-11 19:49                 ` Trond Myklebust
2007-01-12 17:03             ` Viktor
2007-01-20 16:19         ` Denis Vlasenko
2007-01-22 15:52           ` Phillip Susi
2007-01-11  5:50   ` Aubrey
2007-01-11  6:06     ` Andrew Morton
2007-01-11  6:45       ` Aubrey
2007-01-11  6:57         ` Andrew Morton
2007-01-11  7:05           ` Nick Piggin
2007-01-11  7:54             ` Aubrey
2007-01-11  8:05               ` Roy Huang
2007-01-11 16:45                 ` Linus Torvalds
2007-01-17  4:29                   ` Aubrey Li
2007-01-12  2:12                 ` Aubrey
2007-01-12  2:47                   ` Nick Piggin
2007-01-12  3:59                   ` Roy Huang
2007-01-11  8:12               ` Nick Piggin
2007-01-11  8:49                 ` Roy Huang
2007-01-11  9:09                   ` Nick Piggin
2007-01-12  2:48                 ` Bill Davidsen
2007-01-12  4:30                   ` Nick Piggin
2007-01-12  4:46                     ` Linus Torvalds
2007-01-12  4:56                       ` Nick Piggin
2007-01-12  4:58                         ` Nick Piggin
2007-01-12  5:18                         ` Linus Torvalds
2007-01-12  5:22                         ` Aubrey
2007-01-12 14:59                           ` Bill Davidsen
2007-01-13  4:51                             ` Nick Piggin
2007-01-11  6:16     ` Alexander Shishkin
2007-01-11  6:57       ` Aubrey
2007-01-11 12:13   ` Viktor
2007-01-11 15:53     ` Phillip Susi
2007-01-11 16:20     ` Linus Torvalds
2007-01-11 17:13       ` Michael Tokarev
2007-01-11 23:01         ` Phillip Susi
2007-01-11 23:06           ` Hua Zhong
2007-01-12 15:21             ` Phillip Susi
2007-01-20 16:36         ` Denis Vlasenko
2007-01-20 20:55           ` Michael Tokarev
2007-01-20 23:05             ` Denis Vlasenko
2007-01-21 12:09               ` Michael Tokarev
2007-01-21 20:02                 ` Denis Vlasenko
2007-01-22 16:17                   ` Phillip Susi
2007-01-24 21:15                     ` Denis Vlasenko
2007-01-25 15:44                       ` Phillip Susi
2007-01-25 17:38                         ` Denis Vlasenko
2007-01-25 19:28                           ` Phillip Susi
2007-01-25 19:52                             ` Denis Vlasenko
2007-01-25 20:03                               ` Phillip Susi
2007-01-25 20:45                                 ` Michael Tokarev
2007-01-25 21:11                                   ` Denis Vlasenko
2007-01-26 16:02                                     ` Mark Lord
2007-01-26 16:52                                       ` Viktor
2007-01-26 16:58                                       ` Phillip Susi
2007-01-26 17:05                                     ` Phillip Susi
2007-01-26 23:16                                       ` Denis Vlasenko
2007-02-06 20:39                                         ` Pavel Machek
2007-01-26 18:23                                     ` Bill Davidsen
2007-01-26 23:35                                       ` Denis Vlasenko
2007-01-28 15:18                                         ` Bill Davidsen
2007-01-28 17:03                                           ` Denis Vlasenko
2007-01-29 15:43                                             ` Phillip Susi
2007-01-29 17:00                                             ` Andrea Arcangeli
2007-01-30  0:05                                               ` Denis Vlasenko
     [not found]                                               ` <45BE7D99.70200@cfl.rr.com>
     [not found]                                                 ` <20070130023056.GN8030@opteron.random>
     [not found]                                                   ` <45BF65E3.6070102@cfl.rr.com>
     [not found]                                                     ` <20070130164806.GQ8030@opteron.random>
2007-01-30 18:50                                                       ` Phillip Susi
2007-01-30 19:57                                                         ` Andrea Arcangeli
2007-01-30 20:06                                                           ` Andrea Arcangeli
2007-01-30 23:07                                                           ` Phillip Susi
2007-01-31  2:28                                                             ` Andrea Arcangeli
2007-01-31  9:37                                                             ` Michael Tokarev
2007-01-26 15:53                   ` Bill Davidsen
2007-01-11 17:42       ` Alan
2007-01-11 18:00         ` Linus Torvalds
2007-01-12  7:57       ` dean gaudet
2007-01-12 15:27         ` Phillip Susi
2007-01-12 18:06         ` Linus Torvalds
2007-01-12 20:23           ` Chris Mason
2007-01-12 20:46             ` Michael Tokarev
2007-01-12 20:52               ` Michael Tokarev
2007-01-12 21:03                 ` Michael Tokarev
2007-01-12 21:17                   ` Linus Torvalds
2007-01-12 21:54                     ` Michael Tokarev
2007-01-12 22:09                       ` Linus Torvalds
2007-01-12 22:26                         ` Michael Tokarev [this message]
2007-01-12 22:35                         ` Erik Andersen
2007-01-12 22:47                           ` Andrew Morton
2007-01-14  9:11                             ` Nate Diller
2007-01-20 16:45                               ` Denis Vlasenko
2007-01-22  1:47                             ` Andrea Arcangeli
2007-01-13 20:07                     ` Bill Davidsen
2007-01-13 20:27                       ` Michael Tokarev
2007-01-14 15:39                         ` Bill Davidsen
2007-01-12 21:39                   ` Disk Cache, Was: " Zan Lynx
2007-01-12 22:10                     ` Michael Tokarev
2007-01-15 12:11               ` Helge Hafting
2007-01-12 16:59       ` Viktor
2007-01-11 12:45   ` Erik Mouw
2007-01-11  4:51 ` Andrew Morton
2007-01-11  5:06   ` Gerrit Huizenga
2007-01-11 16:09   ` Badari Pulavarty
2007-01-11 12:34 ` linux-os (Dick Johnson)
2007-01-11 13:06   ` Martin Mares
2007-01-11 14:15   ` Jens Axboe
2007-01-12  2:13   ` Bill Davidsen
  -- strict thread matches above, loose matches on Subject: below --
2007-01-17 14:27 Alex Tomas
2007-01-22 15:59 Al Boldi
     [not found] <7BYkO-5OV-17@gated-at.bofh.it>
     [not found] ` <7BYul-6gz-5@gated-at.bofh.it>
     [not found]   ` <7C18X-1zo-5@gated-at.bofh.it>
     [not found]     ` <7C1iw-22q-7@gated-at.bofh.it>
     [not found]       ` <7C1Vb-2Ny-3@gated-at.bofh.it>
     [not found]         ` <7C256-2ZR-27@gated-at.bofh.it>
     [not found]           ` <7C2eE-3rT-15@gated-at.bofh.it>
     [not found]             ` <7C31d-4qb-11@gated-at.bofh.it>
     [not found]               ` <7C3kj-55E-9@gated-at.bofh.it>
2007-01-11 13:20                 ` Bodo Eggert
     [not found]   ` <7C74B-2A4-23@gated-at.bofh.it>
     [not found]     ` <7CaYA-mT-19@gated-at.bofh.it>
     [not found]       ` <7Cpuz-64X-1@gated-at.bofh.it>
     [not found]         ` <7Cz0T-4PH-17@gated-at.bofh.it>
     [not found]           ` <7CBcl-86B-9@gated-at.bofh.it>
     [not found]             ` <7CBvH-52-9@gated-at.bofh.it>
     [not found]               ` <7CBFn-hw-1@gated-at.bofh.it>
     [not found]                 ` <7CBP1-KI-3@gated-at.bofh.it>
     [not found]                   ` <7CBYG-WK-3@gated-at.bofh.it>
2007-01-13 16:53                     ` Bodo Eggert
2007-01-13 19:30                       ` Bill Davidsen
2007-01-14 18:51                         ` Bodo Eggert
     [not found]                     ` <7CXmz-88G-29@gated-at.bofh.it>
     [not found]                       ` <7CXFR-8vZ-15@gated-at.bofh.it>
     [not found]                         ` <7DfMP-2ak-19@gated-at.bofh.it>
2007-01-14 19:39                           ` Bodo Eggert
     [not found]               ` <7DyYK-6lE-3@gated-at.bofh.it>
2007-01-16 20:26                 ` Bodo Eggert
2007-01-17  5:55                   ` Arjan van de Ven
2007-01-17 22:36                     ` Bodo Eggert
     [not found] ` <7HkaQ-2Nb-9@gated-at.bofh.it>
     [not found]   ` <7HDZP-Pv-1@gated-at.bofh.it>
     [not found]     ` <7HIPV-8kp-35@gated-at.bofh.it>
2007-01-27 14:01       ` Bodo Eggert
2007-01-27 14:14         ` Denis Vlasenko
2007-01-28 15:30           ` Bill Davidsen
2007-01-28 17:18             ` Denis Vlasenko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=45A80B21.6040609@tls.msk.ru \
    --to=mjt@tls.msk.ru \
    --cc=akpm@osdl.org \
    --cc=aubreylee@gmail.com \
    --cc=chris.mason@oracle.com \
    --cc=dean@arctic.org \
    --cc=hch@infradead.org \
    --cc=hugh@veritas.com \
    --cc=hzhong@gmail.com \
    --cc=kenneth.w.chen@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=torvalds@osdl.org \
    --cc=vvp01@inbox.ru \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox