From: Eric Dumazet <dada1@cosmosbay.com>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>,
Jens Axboe <jens.axboe@oracle.com>,
linux-kernel@vger.kernel.org, cotte@de.ibm.com, hugh@veritas.com,
neilb@suse.de, zanussi@us.ibm.com, hch@infradead.org
Subject: Re: [PATCH] sendfile removal
Date: Fri, 01 Jun 2007 18:47:19 +0200 [thread overview]
Message-ID: <46604D97.5050000@cosmosbay.com> (raw)
In-Reply-To: <alpine.LFD.0.98.0706010904110.3957@woody.linux-foundation.org>
Linus Torvalds a écrit :
>
> On Fri, 1 Jun 2007, H. Peter Anvin wrote:
>> Fair enough. Unix has traditionally not acknowledged the possibility of
>> nonblocking I/O on conventional files, for some odd reason.
>
> It's not odd at all.
>
> If you return EAGAIN, you had better have a way to _wait_ for that EAGAIN
> to go away, otherwise the EAGAIN is just a total waste of time.
>
> So the rule about EAGAIN is very simple:
> (a) the file descriptor must be O_NONBLOCK
> (b) the access must otherwise block
> AND
> (c) the condition must be something we can wait for with poll/select
>
> I don't know why people continually ignore that (c) point, even though
> it's obvious and very very important!
>
> If you cannot wait for it, tell me why the kernel should _ever_ return
> EAGAIN? The only option for the user is to just do the operation again
> immediately.
>
> And the thing is, neither poll nor select work on regular files. And no,
> that is _not_ just an implementation issue. It's very fundamental: neither
> poll nor select get the file offset to wait for!
>
> And that file offset is _critical_ for a regular file, in a way it
> obviously is _not_ for a socket, pipe, or other special file. Because
> without knowing the file offset, you cannot know which page you should be
> waiting for!
>
> And no, the file offset is not "f_pos". sendfile(), along with
> pread/pwrite, uses a totally separate file offset, so if select/poll were
> to base their decision on f_pos, they'd be _wrong_.
>
> This really is very fundamental.
>
> Now, you can argue that you can always just return -EAGAIN anyway, but
> then the calling process will basically be busy-looping, calling
> sendfile() (or splice()) over and over again. That's _horrible_. It's much
> better to just not return EAGAIN, and sleep like a good process should!
>
> So there's a few things to take away from this:
>
> - regular file access MUST NOT return EAGAIN just because a page isn't
> in the cache. Doing so is simply a bug. No ifs, buts or maybe's about
> it!
>
> Busy-looping is NOT ACCEPTABLE!
yes, very true, but then some apps do this (and sometimes depends on yield())
>
> - you *could* make some alternative conventions:
>
> (a) you could make O_NONBLOCK mean that you'll at least
> guarantee that you *start* the IO, and while you never return
> EAGAIN, you migth validly return a _partial_ result!
>
> (b) variation on (a): it's ok to return EAGAIN if _you_ were the
> one who started the IO during this particular time aroudn the
> loop. But if you find a page that isn't up-to-date yet, and
> you didn't start the IO, you *must* wait for it, so that you
> end up returning EAGAIN atmost once! Exactly because
> busy-looping is simply not acceptable behaviour!
>
> I have to admit that I didn't look at what raw splice() itself does these
> days. I would not be surprised if Jens also didn't realize this very
> fundamental issue. It seems too easy to miss, because people think
> that EAGAIN stands on its own, and don't realize that EAGAIN must be
> paired with select/poll to make sense.
>
Right now, splice() has one SPLICE_F_NONBLOCK flag, and this flag is applied
on both sides (in & out)
So either :
1) We separate the flag into two flags NONBLOCK_IN & NONBLOCK_OUT, so that the
application is free to chose to busy-loop/yield if it wants.
2) We ignore NONBLOCK flag for regular files in splice() (and sendfile()),
just following current facto
3) We consider select()/poll()/splice() can be extended to regular files on
[f_pos] (select() and related functions have a meaning on non-seekable files,
so consider it can be extended on files only on current file pos)
next prev parent reply other threads:[~2007-06-01 16:49 UTC|newest]
Thread overview: 37+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-05-31 10:33 [PATCH] sendfile removal Jens Axboe
2007-05-31 10:47 ` Jens Axboe
2007-05-31 10:47 ` Eric Dumazet
2007-05-31 10:53 ` Jens Axboe
2007-06-01 4:09 ` H. Peter Anvin
2007-06-01 5:41 ` Jens Axboe
2007-06-01 5:50 ` H. Peter Anvin
2007-06-01 7:22 ` Eric Dumazet
2007-06-01 15:52 ` H. Peter Anvin
2007-06-01 16:18 ` Linus Torvalds
2007-06-01 16:47 ` Eric Dumazet [this message]
2007-06-01 16:53 ` H. Peter Anvin
2007-06-02 15:02 ` Jens Axboe
2007-06-02 15:01 ` Jens Axboe
2007-06-02 15:40 ` Linus Torvalds
2007-06-02 16:35 ` Jens Axboe
[not found] ` <20070603130507.GA11170@mail.ustc.edu.cn>
2007-06-03 13:05 ` Fengguang Wu
[not found] ` <20070603142931.GA5916@mail.ustc.edu.cn>
2007-06-03 14:29 ` Fengguang Wu
[not found] ` <20070604004647.GA8076@mail.ustc.edu.cn>
2007-06-04 0:46 ` Fengguang Wu
2007-06-04 8:05 ` Jens Axboe
[not found] ` <20070604112214.GA7457@mail.ustc.edu.cn>
2007-06-04 11:22 ` Fengguang Wu
2007-06-01 16:22 ` Pádraig Brady
2007-05-31 10:55 ` Christoph Hellwig
2007-05-31 11:05 ` Jens Axboe
2007-05-31 12:26 ` Neil Brown
2007-05-31 12:27 ` Jens Axboe
2007-06-01 2:44 ` [PATCH] sendfile removal (nfsd update) Neil Brown
2007-06-01 5:44 ` Jens Axboe
2007-06-01 8:01 ` Jens Axboe
2007-06-01 8:15 ` [PATCH] sendfile removal Jens Axboe
2007-05-31 11:04 ` Carsten Otte
2007-05-31 11:06 ` Jens Axboe
2007-05-31 15:33 ` Tom Zanussi
2007-05-31 19:01 ` Jens Axboe
2007-05-31 17:06 ` Hugh Dickins
2007-05-31 17:31 ` Christoph Hellwig
2007-05-31 19:03 ` Jens Axboe
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=46604D97.5050000@cosmosbay.com \
--to=dada1@cosmosbay.com \
--cc=cotte@de.ibm.com \
--cc=hch@infradead.org \
--cc=hpa@zytor.com \
--cc=hugh@veritas.com \
--cc=jens.axboe@oracle.com \
--cc=linux-kernel@vger.kernel.org \
--cc=neilb@suse.de \
--cc=torvalds@linux-foundation.org \
--cc=zanussi@us.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox