public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Michael Tokarev <mjt@tls.msk.ru>
To: Linus Torvalds <torvalds@osdl.org>
Cc: Chris Mason <chris.mason@oracle.com>,
	dean gaudet <dean@arctic.org>, Viktor <vvp01@inbox.ru>,
	Aubrey <aubreylee@gmail.com>, Hua Zhong <hzhong@gmail.com>,
	Hugh Dickins <hugh@veritas.com>,
	linux-kernel@vger.kernel.org, hch@infradead.org,
	kenneth.w.chen@intel.com, akpm@osdl.org
Subject: Re: O_DIRECT question
Date: Sat, 13 Jan 2007 00:54:23 +0300	[thread overview]
Message-ID: <45A8038F.2040609@tls.msk.ru> (raw)
In-Reply-To: <Pine.LNX.4.64.0701121611370.3470@woody.osdl.org>

Linus Torvalds wrote:
[]
> My point is that you can get basically ALL THE SAME GOOD BEHAVIOUR without 
> having all the BAD behaviour that O_DIRECT adds.

*This* point I got from the beginning, once I tried to think how it all
is done internally (I never thought about that, because I'm not a kernel
hacker to start with) -- currently, linux has ugly/racy places which are
either difficult or impossible to fix, all due to this O_DIRECT thing
which iteracts badly with other access "methods".

> For example, just the requirement that O_DIRECT can never create a file 
> mapping, and can never interact with ftruncate would actually make 
> O_DIRECT a lot more palatable to me. Together with just the requirement 
> that an O_DIRECT open would literally disallow any non-O_DIRECT accesses, 
> and flush the page cache entirely, would make all the aliases go away.
> 
> At that point, O_DIRECT would be a way of saying "we're going to do 
> uncached accesses to this pre-allocated file". Which is a half-way 
> sensible thing to do.

Half-way?

> But what O_DIRECT does right now is _not_ really sensible, and the 
> O_DIRECT propeller-heads seem to have some problem even admitting that 
> there _is_ a problem, because they don't care. 

Well.  In fact, there's NO problems to admit.

Yes, yes, yes yes - when you think about it from a general point of
view, and think how non-O_DIRECT and O_DIRECT access fits together,
it's a complete mess, and you're 100% right it's a mess.

But.  Those damn "database people" don't mix and match the two accesses
together (I'm not one of them, either - I'm just trying to use a DB
product on linux).  So there's just no issue.  The solution to in-kernel
races and problems in this case is the usage scenario, and in following
simple usage rules.  Basically, the above requiriment - "don't mix&match
the two together" - is implemented in userspace (yes, there's no guarantee
that someone/thing will not do some evil thing, but that's controlled by
file permisions).  That is, database software itself will not try to use
the thing in a wrong way.  Simple as that.

> A lot of DB people seem to simply not care about security or anything 
> else.anything else. I'm trying to tell you that quoting numbers is 
> pointless, when simply the CORRECTNESS of O_DIRECT is very much in doubt.

When done properly - be it in user- or kernel-space, it IS correct.  No
database people are ftruncating() a file *and* reading from the past-end
of it at the same time for example, and don't mix-n-match cached and direct
io, at least not for the same part of a file (if there are, they're really
braindead, or it's just a plain bug).

> I can calculate PI to a billion decimal places in my head in .1 seconds. 
> If you don't care about the CORRECTNESS of the result, that is.
> 
> See? It's not about performance. It's about O_DIRECT being fundamentally 
> broken as it behaves right now.

I recall again the above: the actual USAGE of O_DIRECT, as implemented
in database software, tries to ensure there's no brokeness, especially
fundamental brokeness, just by not performing parallel direct/non-direct
read/writes/truncates.  This way, the thing Just Works, works *correctly*
(provided there's no bugs all the way down to a device), *and* works *fast*.

By the way, I can think of some useful cases where *parts* of a file are
mmap()ed (even for RW access), and parts are being read/written with O_DIRECT.
But that's probably some corner cases.

/mjt

  reply	other threads:[~2007-01-12 21:54 UTC|newest]

Thread overview: 130+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-01-11  2:57 O_DIRECT question Aubrey
2007-01-11  3:05 ` Linus Torvalds
2007-01-11  3:15   ` Linus Torvalds
2007-01-11  6:09     ` Nick Piggin
2007-01-11 15:50       ` Linus Torvalds
2007-01-11 16:19         ` Aubrey
2007-01-16  3:41           ` Jörn Engel
2007-01-11 16:23         ` bert hubert
2007-01-11 16:52         ` Xavier Bestel
2007-01-11 17:04           ` Linus Torvalds
2007-01-11 18:41             ` Trond Myklebust
2007-01-11 19:00               ` Linus Torvalds
2007-01-11 19:49                 ` Trond Myklebust
2007-01-12 17:03             ` Viktor
2007-01-20 16:19         ` Denis Vlasenko
2007-01-22 15:52           ` Phillip Susi
2007-01-11  5:50   ` Aubrey
2007-01-11  6:06     ` Andrew Morton
2007-01-11  6:45       ` Aubrey
2007-01-11  6:57         ` Andrew Morton
2007-01-11  7:05           ` Nick Piggin
2007-01-11  7:54             ` Aubrey
2007-01-11  8:05               ` Roy Huang
2007-01-11 16:45                 ` Linus Torvalds
2007-01-17  4:29                   ` Aubrey Li
2007-01-12  2:12                 ` Aubrey
2007-01-12  2:47                   ` Nick Piggin
2007-01-12  3:59                   ` Roy Huang
2007-01-11  8:12               ` Nick Piggin
2007-01-11  8:49                 ` Roy Huang
2007-01-11  9:09                   ` Nick Piggin
2007-01-12  2:48                 ` Bill Davidsen
2007-01-12  4:30                   ` Nick Piggin
2007-01-12  4:46                     ` Linus Torvalds
2007-01-12  4:56                       ` Nick Piggin
2007-01-12  4:58                         ` Nick Piggin
2007-01-12  5:18                         ` Linus Torvalds
2007-01-12  5:22                         ` Aubrey
2007-01-12 14:59                           ` Bill Davidsen
2007-01-13  4:51                             ` Nick Piggin
2007-01-11  6:16     ` Alexander Shishkin
2007-01-11  6:57       ` Aubrey
2007-01-11 12:13   ` Viktor
2007-01-11 15:53     ` Phillip Susi
2007-01-11 16:20     ` Linus Torvalds
2007-01-11 17:13       ` Michael Tokarev
2007-01-11 23:01         ` Phillip Susi
2007-01-11 23:06           ` Hua Zhong
2007-01-12 15:21             ` Phillip Susi
2007-01-20 16:36         ` Denis Vlasenko
2007-01-20 20:55           ` Michael Tokarev
2007-01-20 23:05             ` Denis Vlasenko
2007-01-21 12:09               ` Michael Tokarev
2007-01-21 20:02                 ` Denis Vlasenko
2007-01-22 16:17                   ` Phillip Susi
2007-01-24 21:15                     ` Denis Vlasenko
2007-01-25 15:44                       ` Phillip Susi
2007-01-25 17:38                         ` Denis Vlasenko
2007-01-25 19:28                           ` Phillip Susi
2007-01-25 19:52                             ` Denis Vlasenko
2007-01-25 20:03                               ` Phillip Susi
2007-01-25 20:45                                 ` Michael Tokarev
2007-01-25 21:11                                   ` Denis Vlasenko
2007-01-26 16:02                                     ` Mark Lord
2007-01-26 16:52                                       ` Viktor
2007-01-26 16:58                                       ` Phillip Susi
2007-01-26 17:05                                     ` Phillip Susi
2007-01-26 23:16                                       ` Denis Vlasenko
2007-02-06 20:39                                         ` Pavel Machek
2007-01-26 18:23                                     ` Bill Davidsen
2007-01-26 23:35                                       ` Denis Vlasenko
2007-01-28 15:18                                         ` Bill Davidsen
2007-01-28 17:03                                           ` Denis Vlasenko
2007-01-29 15:43                                             ` Phillip Susi
2007-01-29 17:00                                             ` Andrea Arcangeli
2007-01-30  0:05                                               ` Denis Vlasenko
     [not found]                                               ` <45BE7D99.70200@cfl.rr.com>
     [not found]                                                 ` <20070130023056.GN8030@opteron.random>
     [not found]                                                   ` <45BF65E3.6070102@cfl.rr.com>
     [not found]                                                     ` <20070130164806.GQ8030@opteron.random>
2007-01-30 18:50                                                       ` Phillip Susi
2007-01-30 19:57                                                         ` Andrea Arcangeli
2007-01-30 20:06                                                           ` Andrea Arcangeli
2007-01-30 23:07                                                           ` Phillip Susi
2007-01-31  2:28                                                             ` Andrea Arcangeli
2007-01-31  9:37                                                             ` Michael Tokarev
2007-01-26 15:53                   ` Bill Davidsen
2007-01-11 17:42       ` Alan
2007-01-11 18:00         ` Linus Torvalds
2007-01-12  7:57       ` dean gaudet
2007-01-12 15:27         ` Phillip Susi
2007-01-12 18:06         ` Linus Torvalds
2007-01-12 20:23           ` Chris Mason
2007-01-12 20:46             ` Michael Tokarev
2007-01-12 20:52               ` Michael Tokarev
2007-01-12 21:03                 ` Michael Tokarev
2007-01-12 21:17                   ` Linus Torvalds
2007-01-12 21:54                     ` Michael Tokarev [this message]
2007-01-12 22:09                       ` Linus Torvalds
2007-01-12 22:26                         ` Michael Tokarev
2007-01-12 22:35                         ` Erik Andersen
2007-01-12 22:47                           ` Andrew Morton
2007-01-14  9:11                             ` Nate Diller
2007-01-20 16:45                               ` Denis Vlasenko
2007-01-22  1:47                             ` Andrea Arcangeli
2007-01-13 20:07                     ` Bill Davidsen
2007-01-13 20:27                       ` Michael Tokarev
2007-01-14 15:39                         ` Bill Davidsen
2007-01-12 21:39                   ` Disk Cache, Was: " Zan Lynx
2007-01-12 22:10                     ` Michael Tokarev
2007-01-15 12:11               ` Helge Hafting
2007-01-12 16:59       ` Viktor
2007-01-11 12:45   ` Erik Mouw
2007-01-11  4:51 ` Andrew Morton
2007-01-11  5:06   ` Gerrit Huizenga
2007-01-11 16:09   ` Badari Pulavarty
2007-01-11 12:34 ` linux-os (Dick Johnson)
2007-01-11 13:06   ` Martin Mares
2007-01-11 14:15   ` Jens Axboe
2007-01-12  2:13   ` Bill Davidsen
  -- strict thread matches above, loose matches on Subject: below --
2007-01-17 14:27 Alex Tomas
2007-01-22 15:59 Al Boldi
     [not found] <7BYkO-5OV-17@gated-at.bofh.it>
     [not found] ` <7BYul-6gz-5@gated-at.bofh.it>
     [not found]   ` <7C18X-1zo-5@gated-at.bofh.it>
     [not found]     ` <7C1iw-22q-7@gated-at.bofh.it>
     [not found]       ` <7C1Vb-2Ny-3@gated-at.bofh.it>
     [not found]         ` <7C256-2ZR-27@gated-at.bofh.it>
     [not found]           ` <7C2eE-3rT-15@gated-at.bofh.it>
     [not found]             ` <7C31d-4qb-11@gated-at.bofh.it>
     [not found]               ` <7C3kj-55E-9@gated-at.bofh.it>
2007-01-11 13:20                 ` Bodo Eggert
     [not found]   ` <7C74B-2A4-23@gated-at.bofh.it>
     [not found]     ` <7CaYA-mT-19@gated-at.bofh.it>
     [not found]       ` <7Cpuz-64X-1@gated-at.bofh.it>
     [not found]         ` <7Cz0T-4PH-17@gated-at.bofh.it>
     [not found]           ` <7CBcl-86B-9@gated-at.bofh.it>
     [not found]             ` <7CBvH-52-9@gated-at.bofh.it>
     [not found]               ` <7CBFn-hw-1@gated-at.bofh.it>
     [not found]                 ` <7CBP1-KI-3@gated-at.bofh.it>
     [not found]                   ` <7CBYG-WK-3@gated-at.bofh.it>
2007-01-13 16:53                     ` Bodo Eggert
2007-01-13 19:30                       ` Bill Davidsen
2007-01-14 18:51                         ` Bodo Eggert
     [not found]                     ` <7CXmz-88G-29@gated-at.bofh.it>
     [not found]                       ` <7CXFR-8vZ-15@gated-at.bofh.it>
     [not found]                         ` <7DfMP-2ak-19@gated-at.bofh.it>
2007-01-14 19:39                           ` Bodo Eggert
     [not found]               ` <7DyYK-6lE-3@gated-at.bofh.it>
2007-01-16 20:26                 ` Bodo Eggert
2007-01-17  5:55                   ` Arjan van de Ven
2007-01-17 22:36                     ` Bodo Eggert
     [not found] ` <7HkaQ-2Nb-9@gated-at.bofh.it>
     [not found]   ` <7HDZP-Pv-1@gated-at.bofh.it>
     [not found]     ` <7HIPV-8kp-35@gated-at.bofh.it>
2007-01-27 14:01       ` Bodo Eggert
2007-01-27 14:14         ` Denis Vlasenko
2007-01-28 15:30           ` Bill Davidsen
2007-01-28 17:18             ` Denis Vlasenko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=45A8038F.2040609@tls.msk.ru \
    --to=mjt@tls.msk.ru \
    --cc=akpm@osdl.org \
    --cc=aubreylee@gmail.com \
    --cc=chris.mason@oracle.com \
    --cc=dean@arctic.org \
    --cc=hch@infradead.org \
    --cc=hugh@veritas.com \
    --cc=hzhong@gmail.com \
    --cc=kenneth.w.chen@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=torvalds@osdl.org \
    --cc=vvp01@inbox.ru \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox