public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* disk-based fds in select/poll
@ 2001-06-04 20:43 Pierre Phaneuf
  2001-06-04 21:42 ` Alan Cox
  0 siblings, 1 reply; 5+ messages in thread
From: Pierre Phaneuf @ 2001-06-04 20:43 UTC (permalink / raw)
  To: linux-kernel


Pardon me if some parts of this seem clueless. While I'm no newbie in
userland, kernelspace I don't play in very often...

It's fairly widely-known that select/poll returns immediately when
testing a filesystem-based file descriptor for writability or
readability.

On top of this, even when in non-blocking mode, read() could block if
the pages needed aren't in core. sendfile() behaves in a similar way.

What would be needed to alleviate this?

I am thinking that a read() (or sendfile()) that would block because the
pages aren't in core should instead post a request for the pages to be
loaded (some kind of readahead mecanism?) and return immediately (maybe
having given some data that *was* in core). A subsequent read() could
have the data available, but not necessarily (again, it should give
whatever it has in core, but return immediately).

sendfile() would be a lot more tricky to fix in that way I guess, but
could still be possible (the destination fd would be unwritable for a
while, until the transfer is finished). Also, the complexity would be
higher (instead of simply causing readahead to happen (which might
anyway), it would have to trigger the readahead, then get notification
of when the pages are in core to send over, all the while preventing
data from being written to the destination fd in some way).

In the mean time, I was also wondering if issuing smaller read()
requests in a row might give me a better chance of success. I *know*
read() will block, but if I only ask for, say, a page of data (rather
than asking for the full data and relying on the non-blocking to return
EAGAIN (like it should, IMHO!)), it shouldn't take too long, and could
possibly trigger some readahead to be done by the kernel, right?

Or will the readahead be done "on my own time", read() only returning
after the whole thing (my request + readahead) has been done?

I remember seeing a suggestion by Linus for an event-based I/O
interface, similar to kqueue on FreeBSD but much simpler. I'd just say
"I want it too!", ok? :-)

I know about the mincore() trick with mmap()'d files, but with small
files, mmap()ing might not make sense (could be very often).

SGI's AIO might be a solution here, does it use threads? I'm trying to
avoid context switching as much as possible, to keep the CPU cache as
warm as possible.

Well, I might not have the choice to use threads, after all...

(sorry if this message got in twice, I used an NNTP gateway the previous
time, I don't think it got through)

-- 
Pierre Phaneuf

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: disk-based fds in select/poll
  2001-06-04 20:43 disk-based fds in select/poll Pierre Phaneuf
@ 2001-06-04 21:42 ` Alan Cox
  2001-06-04 22:42   ` Pierre Phaneuf
  0 siblings, 1 reply; 5+ messages in thread
From: Alan Cox @ 2001-06-04 21:42 UTC (permalink / raw)
  To: Pierre Phaneuf; +Cc: linux-kernel

> I am thinking that a read() (or sendfile()) that would block because the
> pages aren't in core should instead post a request for the pages to be
> loaded (some kind of readahead mecanism?) and return immediately (maybe
> having given some data that *was* in core). A subsequent read() could

reads posts a readahead anyway so streaming reads tend not to block much

> SGI's AIO might be a solution here, does it use threads? I'm trying to
> avoid context switching as much as possible, to keep the CPU cache as
> warm as possible.

glibc 2.2 does thread based aio_ and it will tend to avoid cache damage as
the thread share the mm but on SMP its quite possible the read wil occur on
the other CPU. Of course kernel based I/O might do the same too..


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: disk-based fds in select/poll
  2001-06-04 21:42 ` Alan Cox
@ 2001-06-04 22:42   ` Pierre Phaneuf
  2001-06-04 23:16     ` Alan Cox
  0 siblings, 1 reply; 5+ messages in thread
From: Pierre Phaneuf @ 2001-06-04 22:42 UTC (permalink / raw)
  To: linux-kernel

Alan Cox wrote:

> > I am thinking that a read() (or sendfile()) that would block because the
> > pages aren't in core should instead post a request for the pages to be
> > loaded (some kind of readahead mecanism?) and return immediately (maybe
> > having given some data that *was* in core). A subsequent read() could
> 
> reads posts a readahead anyway so streaming reads tend not to block much

Ok, so while knowing about select "lying" about readability of a file
fd, if I would stick a file fd in my select-based loop anyway, but would
only try to read a bit at a time (say, 4K or 8K) would trigger
readahead, yet finish quickly enough that I can get back to processing
other fds in my select loop?

Wouldn't that cause too many syscalls to be done? Or if this is actually
the way to go without an actual thread, how should I go determining an
optimal block size?

Was there anything new on the bind_event/get_events API idea that Linus
proposed a while ago? That one had got me foaming at the mouth... :-)

-- 
Pierre Phaneuf

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: disk-based fds in select/poll
  2001-06-04 22:42   ` Pierre Phaneuf
@ 2001-06-04 23:16     ` Alan Cox
  0 siblings, 0 replies; 5+ messages in thread
From: Alan Cox @ 2001-06-04 23:16 UTC (permalink / raw)
  To: Pierre Phaneuf; +Cc: linux-kernel

> Ok, so while knowing about select "lying" about readability of a file
> fd, if I would stick a file fd in my select-based loop anyway, but would

You could fix select to return when the page was cachied and return EWOULDBLOCK
on reads if the page was not present to be honest. I don't think that would
actually break any apps, and the specs seem to allow it

> only try to read a bit at a time (say, 4K or 8K) would trigger
> readahead, yet finish quickly enough that I can get back to processing
> other fds in my select loop?

Probably

> Wouldn't that cause too many syscalls to be done? Or if this is actually
> the way to go without an actual thread, how should I go determining an
> optimal block size?

fs block size I suspect or small multiple thereof


^ permalink raw reply	[flat|nested] 5+ messages in thread

* re: disk-based fds in select/poll
@ 2001-06-05  0:27 Dan Kegel
  0 siblings, 0 replies; 5+ messages in thread
From: Dan Kegel @ 2001-06-05  0:27 UTC (permalink / raw)
  To: Pierre Phaneuf, linux-kernel@vger.kernel.org

Pierre Phaneuf <pp@ludusdesign.com> wrote:
> It's fairly widely-known that select/poll returns immediately when
> testing a filesystem-based file descriptor for writability or
> readability.
> 
> On top of this, even when in non-blocking mode, read() could block if
> the pages needed aren't in core. sendfile() behaves in a similar way.
> 
> What would be needed to alleviate this?
> ...
> I remember seeing a suggestion by Linus for an event-based I/O
> interface, similar to kqueue on FreeBSD but much simpler. I'd just say
> "I want it too!", ok? :-)
>
> SGI's AIO might be a solution here, does it use threads? I'm trying to
> avoid context switching as much as possible, to keep the CPU cache as
> warm as possible.

IMHO, you want AIO.  SGI's is fine for now.  I hear rumors that there will be
something even better coming in 2.5, though I have no details.

Or you could use explicit userspace threads... say, divide up your
network connections among 8 or so threads.  Then if one thread blocks,
the others are there to usefully soak up the CPU time.

Readiness events for readahead completion on disk files used to 
seem like a neat idea to me, but now AIO seems more appealing
in the long run, since they handle random access properly.

- Dan

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2001-06-05  0:27 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2001-06-04 20:43 disk-based fds in select/poll Pierre Phaneuf
2001-06-04 21:42 ` Alan Cox
2001-06-04 22:42   ` Pierre Phaneuf
2001-06-04 23:16     ` Alan Cox
  -- strict thread matches above, loose matches on Subject: below --
2001-06-05  0:27 Dan Kegel

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox