All of lore.kernel.org
 help / color / mirror / Atom feed
From: Michael Tokarev <mjt@tls.msk.ru>
To: Linus Torvalds <torvalds@osdl.org>
Cc: Chris Mason <chris.mason@oracle.com>,
	dean gaudet <dean@arctic.org>, Viktor <vvp01@inbox.ru>,
	Aubrey <aubreylee@gmail.com>, Hua Zhong <hzhong@gmail.com>,
	Hugh Dickins <hugh@veritas.com>,
	linux-kernel@vger.kernel.org, hch@infradead.org,
	kenneth.w.chen@intel.com, akpm@osdl.org
Subject: Re: O_DIRECT question
Date: Sat, 13 Jan 2007 00:54:23 +0300	[thread overview]
Message-ID: <45A8038F.2040609@tls.msk.ru> (raw)
In-Reply-To: <Pine.LNX.4.64.0701121611370.3470@woody.osdl.org>

Linus Torvalds wrote:
[]
> My point is that you can get basically ALL THE SAME GOOD BEHAVIOUR without 
> having all the BAD behaviour that O_DIRECT adds.

*This* point I got from the beginning, once I tried to think how it all
is done internally (I never thought about that, because I'm not a kernel
hacker to start with) -- currently, linux has ugly/racy places which are
either difficult or impossible to fix, all due to this O_DIRECT thing
which iteracts badly with other access "methods".

> For example, just the requirement that O_DIRECT can never create a file 
> mapping, and can never interact with ftruncate would actually make 
> O_DIRECT a lot more palatable to me. Together with just the requirement 
> that an O_DIRECT open would literally disallow any non-O_DIRECT accesses, 
> and flush the page cache entirely, would make all the aliases go away.
> 
> At that point, O_DIRECT would be a way of saying "we're going to do 
> uncached accesses to this pre-allocated file". Which is a half-way 
> sensible thing to do.

Half-way?

> But what O_DIRECT does right now is _not_ really sensible, and the 
> O_DIRECT propeller-heads seem to have some problem even admitting that 
> there _is_ a problem, because they don't care. 

Well.  In fact, there's NO problems to admit.

Yes, yes, yes yes - when you think about it from a general point of
view, and think how non-O_DIRECT and O_DIRECT access fits together,
it's a complete mess, and you're 100% right it's a mess.

But.  Those damn "database people" don't mix and match the two accesses
together (I'm not one of them, either - I'm just trying to use a DB
product on linux).  So there's just no issue.  The solution to in-kernel
races and problems in this case is the usage scenario, and in following
simple usage rules.  Basically, the above requiriment - "don't mix&match
the two together" - is implemented in userspace (yes, there's no guarantee
that someone/thing will not do some evil thing, but that's controlled by
file permisions).  That is, database software itself will not try to use
the thing in a wrong way.  Simple as that.

> A lot of DB people seem to simply not care about security or anything 
> else.anything else. I'm trying to tell you that quoting numbers is 
> pointless, when simply the CORRECTNESS of O_DIRECT is very much in doubt.

When done properly - be it in user- or kernel-space, it IS correct.  No
database people are ftruncating() a file *and* reading from the past-end
of it at the same time for example, and don't mix-n-match cached and direct
io, at least not for the same part of a file (if there are, they're really
braindead, or it's just a plain bug).

> I can calculate PI to a billion decimal places in my head in .1 seconds. 
> If you don't care about the CORRECTNESS of the result, that is.
> 
> See? It's not about performance. It's about O_DIRECT being fundamentally 
> broken as it behaves right now.

I recall again the above: the actual USAGE of O_DIRECT, as implemented
in database software, tries to ensure there's no brokeness, especially
fundamental brokeness, just by not performing parallel direct/non-direct
read/writes/truncates.  This way, the thing Just Works, works *correctly*
(provided there's no bugs all the way down to a device), *and* works *fast*.

By the way, I can think of some useful cases where *parts* of a file are
mmap()ed (even for RW access), and parts are being read/written with O_DIRECT.
But that's probably some corner cases.

/mjt

  reply	other threads:[~2007-01-12 21:54 UTC|newest]

Thread overview: 130+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-01-11  2:57 O_DIRECT question Aubrey
2007-01-11  3:05 ` Linus Torvalds
2007-01-11  3:15   ` Linus Torvalds
2007-01-11  6:09     ` Nick Piggin
2007-01-11 15:50       ` Linus Torvalds
2007-01-11 16:19         ` Aubrey
2007-01-16  3:41           ` Jörn Engel
2007-01-11 16:23         ` bert hubert
2007-01-11 16:52         ` Xavier Bestel
2007-01-11 17:04           ` Linus Torvalds
2007-01-11 18:41             ` Trond Myklebust
2007-01-11 19:00               ` Linus Torvalds
2007-01-11 19:49                 ` Trond Myklebust
2007-01-12 17:03             ` Viktor
2007-01-20 16:19         ` Denis Vlasenko
2007-01-22 15:52           ` Phillip Susi
2007-01-11  5:50   ` Aubrey
2007-01-11  6:06     ` Andrew Morton
2007-01-11  6:45       ` Aubrey
2007-01-11  6:57         ` Andrew Morton
2007-01-11  7:05           ` Nick Piggin
2007-01-11  7:54             ` Aubrey
2007-01-11  8:05               ` Roy Huang
2007-01-11 16:45                 ` Linus Torvalds
2007-01-17  4:29                   ` Aubrey Li
2007-01-12  2:12                 ` Aubrey
2007-01-12  2:47                   ` Nick Piggin
2007-01-12  3:59                   ` Roy Huang
2007-01-11  8:12               ` Nick Piggin
2007-01-11  8:49                 ` Roy Huang
2007-01-11  9:09                   ` Nick Piggin
2007-01-12  2:48                 ` Bill Davidsen
2007-01-12  4:30                   ` Nick Piggin
2007-01-12  4:46                     ` Linus Torvalds
2007-01-12  4:56                       ` Nick Piggin
2007-01-12  4:58                         ` Nick Piggin
2007-01-12  5:18                         ` Linus Torvalds
2007-01-12  5:22                         ` Aubrey
2007-01-12 14:59                           ` Bill Davidsen
2007-01-13  4:51                             ` Nick Piggin
2007-01-11  6:16     ` Alexander Shishkin
2007-01-11  6:57       ` Aubrey
2007-01-11 12:13   ` Viktor
2007-01-11 15:53     ` Phillip Susi
2007-01-11 16:20     ` Linus Torvalds
2007-01-11 17:13       ` Michael Tokarev
2007-01-11 23:01         ` Phillip Susi
2007-01-11 23:06           ` Hua Zhong
2007-01-12 15:21             ` Phillip Susi
2007-01-20 16:36         ` Denis Vlasenko
2007-01-20 20:55           ` Michael Tokarev
2007-01-20 23:05             ` Denis Vlasenko
2007-01-21 12:09               ` Michael Tokarev
2007-01-21 20:02                 ` Denis Vlasenko
2007-01-22 16:17                   ` Phillip Susi
2007-01-24 21:15                     ` Denis Vlasenko
2007-01-25 15:44                       ` Phillip Susi
2007-01-25 17:38                         ` Denis Vlasenko
2007-01-25 19:28                           ` Phillip Susi
2007-01-25 19:52                             ` Denis Vlasenko
2007-01-25 20:03                               ` Phillip Susi
2007-01-25 20:45                                 ` Michael Tokarev
2007-01-25 21:11                                   ` Denis Vlasenko
2007-01-26 16:02                                     ` Mark Lord
2007-01-26 16:52                                       ` Viktor
2007-01-26 16:58                                       ` Phillip Susi
2007-01-26 17:05                                     ` Phillip Susi
2007-01-26 23:16                                       ` Denis Vlasenko
2007-02-06 20:39                                         ` Pavel Machek
2007-01-26 18:23                                     ` Bill Davidsen
2007-01-26 23:35                                       ` Denis Vlasenko
2007-01-28 15:18                                         ` Bill Davidsen
2007-01-28 17:03                                           ` Denis Vlasenko
2007-01-29 15:43                                             ` Phillip Susi
2007-01-29 17:00                                             ` Andrea Arcangeli
2007-01-30  0:05                                               ` Denis Vlasenko
     [not found]                                               ` <45BE7D99.70200@cfl.rr.com>
     [not found]                                                 ` <20070130023056.GN8030@opteron.random>
     [not found]                                                   ` <45BF65E3.6070102@cfl.rr.com>
     [not found]                                                     ` <20070130164806.GQ8030@opteron.random>
2007-01-30 18:50                                                       ` Phillip Susi
2007-01-30 19:57                                                         ` Andrea Arcangeli
2007-01-30 20:06                                                           ` Andrea Arcangeli
2007-01-30 23:07                                                           ` Phillip Susi
2007-01-31  2:28                                                             ` Andrea Arcangeli
2007-01-31  9:37                                                             ` Michael Tokarev
2007-01-26 15:53                   ` Bill Davidsen
2007-01-11 17:42       ` Alan
2007-01-11 18:00         ` Linus Torvalds
2007-01-12  7:57       ` dean gaudet
2007-01-12 15:27         ` Phillip Susi
2007-01-12 18:06         ` Linus Torvalds
2007-01-12 20:23           ` Chris Mason
2007-01-12 20:46             ` Michael Tokarev
2007-01-12 20:52               ` Michael Tokarev
2007-01-12 21:03                 ` Michael Tokarev
2007-01-12 21:17                   ` Linus Torvalds
2007-01-12 21:54                     ` Michael Tokarev [this message]
2007-01-12 22:09                       ` Linus Torvalds
2007-01-12 22:26                         ` Michael Tokarev
2007-01-12 22:35                         ` Erik Andersen
2007-01-12 22:47                           ` Andrew Morton
2007-01-14  9:11                             ` Nate Diller
2007-01-20 16:45                               ` Denis Vlasenko
2007-01-22  1:47                             ` Andrea Arcangeli
2007-01-13 20:07                     ` Bill Davidsen
2007-01-13 20:27                       ` Michael Tokarev
2007-01-14 15:39                         ` Bill Davidsen
2007-01-12 21:39                   ` Disk Cache, Was: " Zan Lynx
2007-01-12 22:10                     ` Michael Tokarev
2007-01-15 12:11               ` Helge Hafting
2007-01-12 16:59       ` Viktor
2007-01-11 12:45   ` Erik Mouw
2007-01-11  4:51 ` Andrew Morton
2007-01-11  5:06   ` Gerrit Huizenga
2007-01-11 16:09   ` Badari Pulavarty
2007-01-11 12:34 ` linux-os (Dick Johnson)
2007-01-11 13:06   ` Martin Mares
2007-01-11 14:15   ` Jens Axboe
2007-01-12  2:13   ` Bill Davidsen
  -- strict thread matches above, loose matches on Subject: below --
2007-01-17 14:27 Alex Tomas
2007-01-22 15:59 Al Boldi
     [not found] <7BYkO-5OV-17@gated-at.bofh.it>
     [not found] ` <7BYul-6gz-5@gated-at.bofh.it>
     [not found]   ` <7C18X-1zo-5@gated-at.bofh.it>
     [not found]     ` <7C1iw-22q-7@gated-at.bofh.it>
     [not found]       ` <7C1Vb-2Ny-3@gated-at.bofh.it>
     [not found]         ` <7C256-2ZR-27@gated-at.bofh.it>
     [not found]           ` <7C2eE-3rT-15@gated-at.bofh.it>
     [not found]             ` <7C31d-4qb-11@gated-at.bofh.it>
     [not found]               ` <7C3kj-55E-9@gated-at.bofh.it>
2007-01-11 13:20                 ` Bodo Eggert
     [not found]   ` <7C74B-2A4-23@gated-at.bofh.it>
     [not found]     ` <7CaYA-mT-19@gated-at.bofh.it>
     [not found]       ` <7Cpuz-64X-1@gated-at.bofh.it>
     [not found]         ` <7Cz0T-4PH-17@gated-at.bofh.it>
     [not found]           ` <7CBcl-86B-9@gated-at.bofh.it>
     [not found]             ` <7CBvH-52-9@gated-at.bofh.it>
     [not found]               ` <7CBFn-hw-1@gated-at.bofh.it>
     [not found]                 ` <7CBP1-KI-3@gated-at.bofh.it>
     [not found]                   ` <7CBYG-WK-3@gated-at.bofh.it>
2007-01-13 16:53                     ` Bodo Eggert
2007-01-13 19:30                       ` Bill Davidsen
2007-01-14 18:51                         ` Bodo Eggert
     [not found]                     ` <7CXmz-88G-29@gated-at.bofh.it>
     [not found]                       ` <7CXFR-8vZ-15@gated-at.bofh.it>
     [not found]                         ` <7DfMP-2ak-19@gated-at.bofh.it>
2007-01-14 19:39                           ` Bodo Eggert
     [not found]               ` <7DyYK-6lE-3@gated-at.bofh.it>
2007-01-16 20:26                 ` Bodo Eggert
2007-01-17  5:55                   ` Arjan van de Ven
2007-01-17 22:36                     ` Bodo Eggert
     [not found] ` <7HkaQ-2Nb-9@gated-at.bofh.it>
     [not found]   ` <7HDZP-Pv-1@gated-at.bofh.it>
     [not found]     ` <7HIPV-8kp-35@gated-at.bofh.it>
2007-01-27 14:01       ` Bodo Eggert
2007-01-27 14:14         ` Denis Vlasenko
2007-01-28 15:30           ` Bill Davidsen
2007-01-28 17:18             ` Denis Vlasenko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=45A8038F.2040609@tls.msk.ru \
    --to=mjt@tls.msk.ru \
    --cc=akpm@osdl.org \
    --cc=aubreylee@gmail.com \
    --cc=chris.mason@oracle.com \
    --cc=dean@arctic.org \
    --cc=hch@infradead.org \
    --cc=hugh@veritas.com \
    --cc=hzhong@gmail.com \
    --cc=kenneth.w.chen@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=torvalds@osdl.org \
    --cc=vvp01@inbox.ru \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.